We are very excited to join forces with MLCommons and OctoML.ai! Contact Grigori Fursin for more details!

Grigori Fursin


LinkedIn    Google scholar    Bio    Twitter    GitHub    Contact

My passion is to work with the community to make Machine Learning simpler and easier to use in the real world; automate co-design and deployment of efficient ML Systems in terms of speed, accuracy, energy, and various costs; develop open-source tools for collaborative and reproducible research, and support open science.

My news:
  • 2021 October: My Collective Knowledge framework became an official MLCommons project! I am looking forward to work with the community to make it easier to benchmark and co-design efficient ML Systems across continuously changing hardware, software, models and data sets!
  • 2021 April: I am excited to join OctoML.ai as a VP of MLOps and work with a fantastic team to automate development, optimization and deployment of efficient ML Systems (speed, accuracy, energy, size and costs) from the cloud to the edge that can help to solve real world problems.
  • 2021 March: My ACM TechTalk about "reproducing 150 Research Papers and Testing Them in the Real World" is available on the ACM YouTube channel.
  • 2021 March: The report from the "Workflows Community Summit: Bringing the Scientific Workflows Community Together" is now available in ArXiv.
  • 2021 March: My paper about the CK techology has appeared in the Philosophical Transactions A, the world's longest-running journal where Newton published: DOI, ArXiv.
  • 2020 December: I am honored to join MLCommons as a founding member to accelerate machine learning and systems innovation along with 50+ leading companies and universities: press-release.
  • 2020 December: We are organizing artifact evaluation at ACM ASPLOS'21.
  • 2020 November: The overview of the CK project was accepted for the Philosophical Transactions of the Royal Society: peer-reviewed preprint.
  • 2020 October: My CK framework helped to automate and reproduce many MLPerf benchmark v0.7 inference submissions: see shared CK solutions and CK dashboards to automate SW/HW co-design for edge devices.
  • 2020 September: My Reddit discussion about our painful experience reproducing ML and systems papers during artifact evaluation.

My academic research (tenured research scientist at INRIA with PhD in CS from the University of Edinburgh)
  • I was among the first researchers to combine machine learning, autotuning and knowledge sharing to automate and accelerate the development of efficient software and hardware by several orders of magnitude (Google scholar);
  • developed open-source tools and started educational initiatives (ACM, Raspberry Pi foundation) to bring this research to the real world (see use cases);
  • prepared and tought M.S. course at Paris-Saclay University on using ML to co-design efficient software and hardare (self-optimizing computing systems);
  • gave 100+ invited research talks;
  • honored to receive the ACM CGO test of time award, several best papers awards and INRIA award of scientific excellence.
Project management, system design and consulting (collaboration with IBM, Intel, Arm, Synopsys, Google, Mozilla, General Motors)
  • led the development of the world's first ML-based compiler and the cTuning.org platform across 5 teams to automate and crowdsource optimization of computer systems (IBM and Fujitsu press-releases; invitation to help establish Intel Exascale Lab and lead SW/HW co-design group);
  • developed a compiler plugin framework that was added to the mainline GCC powering all Linux-based computers and helped to convert production compilers into research toolsets for machine learning;
  • developed the Collective Knowledge framework to automate and accelerate design space exploration of AI/ML/SW/HW stacks while balancing speed, accuracy, energy and costs (188K+ downloads); CK helped to automate most of MLPerf inference benchmark submissions for edge devices as mentioned by Forbes, ZDNet and EETimes;
  • co-founded an engineering company and led it to $1M+ in revenue with Fortune 50 customers using my CK technology;
  • founded and developed the cKnowledge.io platform to organize all research knowledge (AI, ML, Systems, quantum) in the form of portable CK workflows, common APIs and reusable artifacts (acquired by OctoML.ai) . It helps users to quickly find innovative technology, test it and adopt in production.
Community service (collaboration with ACM, MLPerf and the Raspberry Pi foundation)

Professional Career

Community service



Main scientific and community contributions

Professional memberships

ACM, IEEE, MLCommons, MLPerf, HiPEAC

Main software developments and technology used

2020-cur.: Developed a prototype of the CK platform to organize all knowledge about AI, ML, systems, and other innovative technology from my academic and industrial partners in the form of portable CK workflows, automation actions, and reusable artifacts. I use it to automate co-design and comparison of efficient AI/Ml/SW/HW stacks from data centers and supercomputers to mobile phones and edge devices in terms of speed, accuracy, energy, and various costs. I also use this platform to help organizations reproduce innovative AI, ML, and systems techniques from research papers and accelerate their adoption in production. I collaborate with MLPerf.org to automate and simplify ML&systems benchmarking and fair comparison based on the CK concept and DevOps/MLOps principles.
I used the following technologies: Linux/Windows/Android; Python/JavaScript/CK; apache2; flask/django; ElasticSearch; GitHub/GitLab/BitBucket; REST JSON API; Travis CI/AppVeyor CI; DevOps; CK-based knowledge graph database; TensorFlow; Azure/AWS/Google cloud/IBM cloud .
2018-cur.: Enhanced and stabilized all main CK components (software detection, package installation, benchmarking pipeline, autotuning, reproducible experiments, visualization) successfully used by dividiti to automate MLPerf benchmark submissions.
I used the following technologies: Linux/Windows/Android; CK/Python/JavaScript/C/C++; statistical analysis; MatPlotLib/numpy/pandas/jupyter notebooks; GCC/LLVM; TensorFlow/PyTorch; Main AI algorithms, models and data sets for image detection and object classification; Azure/AWS/Google cloud/IBM cloud; mobile phones/edge devices/servers; Nvidia GPU/EdgeTPU/x86/Arm architectures .
2017-2018: Developed CK workflows and live dashboards for the 1st open ACM REQUEST tournament to co-design Pareto-efficient SW/HW stacks for ML and AI in terms of speed, accuracy, energy, and costs. We later reused this CK functionality to automate MLPerf submissions.
I used the following technologies: CK; LLVM/GCC/iCC; ImageNet; MobileNets, ResNet-18, ResNet-50, Inception-v3, VGG16, SSD, and AlexNet; MXNet, TensorFlow, Caffe, Keras, Arm Compute Library, cuDNN, TVM, and NNVM; Xilinx Pynq-Z1 FPGA/Arm Cortex CPUs/Arm Mali GPGPUs (Linaro HiKey960 and T-Firefly RK3399)/a farm of Raspberry Pi devices/NVIDIA Jetson TX2/Intel Xeon servers in Amazon Web Services, Google Cloud and Microsoft Azure .
2017-2018: Developed an example of the autogenerated and reproducible paper with a Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques (collaboration with the Raspberry Pi foundation).
I used the following technologies: Linux/Windows; LLVM/GCC; CK; C/C++/Fortran; MILEPOST GCC code features/hardware counters; DNN (TensorFlow)/KNN/SVM/decision trees; PCA; statistical analysis; crowd-benchmarking; crowd-tuning .
2015-cur.: Developed the Collective Knowledge framework (CK) to help the community automate typical tasks in ML&systems R&D, provide a common format, APIs, and meta descriptions for shared research projects, enable portable workflows, and improve the reproducibility and reusability in computational research. We now use it to automate benchmarking, optimization and co-design of AI/ML/SW/HW stacks in terms of speed, accuracy, energy and other costs across diverse platforms from data centers to edge devices.
I used the following technologies: Linux/Windows/Android/Edge devices; Python/C/C++/Java; ICC/GCC/LLVM; JSON/REST API; DevOps; plugins; apache2; Azure cloud; client/server architecture; noSQL database (ElasticSearch); GitHub/GitLab/BitBucket; Travis CI/AppVeyor CI; main math libraries, DNN frameworks, models, and datasets .
2012-2014: Prototyped the Collective Mind framework - prequel to CK. I focused on web services but it turned out that my users wanted basic CLI-based framework. This feedback motivated me to develop a simple CLI-based CK framework.
2010-2011: Helped to create KDataSets (1000 data sets for CPU benchmarks) (PLDI paper, repo).
2008-2010: Developed the Machine learning based self-optimizing compiler connected with cTuning.org in collaboration with IBM, Arc (Synopsys), Inria, and the University of Edinburgh. This technology is considered to be the first in the world;
I used the following technologies: Linux; GCC; C/C++/Fortran/Prolog; semantic features/hardware counters; KNN/decision trees; PCA; statistical analysis; crowd-benchmarking; crowd-tuning; plugins; client/server architecture .
2008-2009: Added the function cloning process to GCC to enable run-time adaptation for statically-compiled programs (report).
2008-2009: Developed the interactive compilation interface now available in mainline GCC (collaboration with Google and Mozilla).
2008-cur.: Developed the cTuning.org portal to crowdsource training of ML-based MILEPOST compiler and automate SW/HW co-design similar to SETI@home. See press-releases from IBM and Fujitsu about my cTuning concept.
I used the following technologies: Linux/Windows; MediaWiki; MySQL; C/C++/Fortran/Java; MILEPOST GCC; PHP; apache2; client/server architecture; KNN/SVM/decision trees; plugins .
2009-2010: Created cBench (collaborative CPU benchmark to support autotuning R&D) and connected it with my cTuning infrastructure from the MILEPOST project.
2005-2009: Created MiDataSets - multiple datasets for MiBench (20+ datasets per benchmark; 400 in total) to support autotuning R&D.
1999-2004: Developed a collaborative infrastructure to autotune HPC workloads (Edinburgh Optimization Software) for the EU MHAOTEU project.
I used the following technologies: Linux/Windows; Java/C/C++/Fortran; Java-based GUI; client/server infrastructure with plugins to integrate autotuning/benchmarking tools and techniques from other partners .
1999-2001: Developed a polyhedral source-to-source compiler for memory hierarchy optimization in HPC used in the EU MHAOTEU project.
I used the following technologies: C++; GCC/SUIF/POLARIS .
1998-1999: Developed a web-based service to automate the submission and execution of tasks to supercomputers via Internet used in the Russian Academy of Sciences.
I used the following technologies: Linux/Windows; apache/IIS; MySQL; C/C++/Fortran/Visual Basic; MPI; Cray T3D .
1993-1998: Developed an analog semiconductor neural network accelerator (Hopfield architecture). My R&D tasks included the NN design, simulation, development of an electronic board connected with a PC to experiment with semiconductor NN, data set preparation, training, benchmarking, and optimization of this NN.
I used the following technologies: MS-DOS/Windows/Linux; C/C++/assembler for NN implementation; MPI for distributed training; PSpice for electronic circuit simulation; ADC, DAC, and LPT to measure semiconductor NN and communicate with a PC; Visual Basic to visualize experiments .
1991-1993: Developed and sold software to automate financial operations in SMEs.
I used the following technologies: MS-DOS; Turbo C/C++; assembler for printer/video drivers; my own library for Windows management .

My favorite story about Ernest Rutherford and Niels Bohr