Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.
ACM REQUEST-ASPLOS'18: the 1st Reproducible Tournament on Pareto-efficient Image Classification

Results of the 1st reproducible ACM ReQuEST-ASPLOS'18 tournament:

Goals

Our long-term goal is to develop a common methodology and framework for reproducible co-design of the efficient software/hardware stack for emerging algorithms requested by our advisory board (inference, object detection, training, etc) in terms of speed, accuracy, energy, size, complexity, costs and other metrics. Open REQUEST competitions bring together AI, ML and systems researchers to share complete algorithm implementations (code and data) as portable, customizable and reusable Collective Knowledge workflows. This helps other researchers and end-users to quickly validate such results, reuse workflows and optimize/autotune algorithms across different platforms, models, data sets, libraries, compilers and tools. We will also use our practical experience reproducing experimental results from REQUEST submissions to help set up artifact evaluation at the upcoming MLSys, and to suggest new algorithms for the inclusion to the MLPerf benchmark.

The associated ACM ReQuEST workshop is co-located with ASPLOS 2018 March 24th, 2018 (afternoon), Williamsburg, VA, USA.

A ReQuEST introduction and long-term goals: cKnowledge.org/request website and ArXiv paper.

Steering committee (A-Z)

Advisory/industrial board (A-Z)

Partners

Workshop program

Time slot
Presentation
Reusable artifacts

1:30pm—1:40pm

1:30pm—1:40pm

Workshop introduction

ReQuEST tournaments bring together multidisciplinary researchers (AI, ML, systems) to find the most efficient solutions for realistic problems requested by the advisory board in terms of speed, accuracy, energy, complexity, costs and other metrics across the whole application/software/hardware stack In a fair and reproducible way. All the winning solutions (code, data, workflow) on a Pareto-frontier are then available to the community as portable and customizablelug&play" AI/ML components with a common API and meta information. The ultimate goal is to accelerate research and reduce costs by reusing the most accurate and efficient AI/ML blocks continuously optimized, autotuned and crowd-tuned across diverse models, data sets and platforms from a cloud to edge.

1:40pm—2:30pm

1:40pm—2:30pm

Keynote "The Retrospect and Prospect of Low-Power Image Recognition Challenge (LPIRC)"

Prof. Yiran Chen, Duke University, USA

Slides in PDF

Abstract: Reducing power consumption has been one of the most important goals since the creation of electronic systems. Energy efficiency is increasingly important as battery-powered systems (such as smartphones, drones, and body cameras) are widely used. It is desirable using the on-board computers to recognize objects in the images captured by these cameras. The Low-Power Image Recognition Challenge (LPIRC) is an annual competition started in 2015, aiming to discover the best technology in both image recognition and energy conservation. In this talk, we will explains the rules of the competition and the rationale, summarizes the teams' scores, and describes the lessons learned in the past years. We will also discuss possible improvements of future challenges and collaboration opportunities with other events and competitions like ReQuEST.

Short bio: Yiran Chen received B.S and M.S. from Tsinghua University and Ph.D. from Purdue University in 2005. After five years in industry, he joined University of Pittsburgh in 2010 as Assistant Professor and then promoted to Associate Professor with tenure in 2014, held Bicentennial Alumni Faculty Fellow. He now is a tenured Associate Professor of the Department of Electrical and Computer Engineering at Duke University and serving as the co-director of Duke Center for Evolutionary Intelligence (CEI), focusing on the research of new memory and storage systems, machine learning and neuromorphic computing, and mobile computing systems. Dr. Chen has published one book and more than 300 technical publications and has been granted 93 US patents. He is the associate editor of IEEE TNNLS, IEEE D&T, IEEE ESL, ACM JETC, and ACM TCPS, and served on the technical and organization committees of more than 40 international conferences. He received 6 best paper awards and 12 best paper nominations from international conferences. He is the recipient of NSF CAREER award and ACM SIGDA outstanding new faculty award. He is the Fellow of IEEE.

See LPIRC tournaments.

2:30pm—2:50pm

2:30pm—2:50pm

"Real-Time Image Recognition Using Collaborative IoT Devices"

Ramyad Hadidi, Jiashen Cao, Matthew Woodward, Michael S. Ryoo, Hyesoon Kim

Georgia Institute of Technology, USA
Nvidia Jetson TX2, Arm, Raspberry Pi, AlexNet, VGG16, TensorFlow, Keras, Avro
Nvidia Jetson TX2, Arm, Raspberry Pi, AlexNet, VGG16, TensorFlow, Keras, Avro

2:50pm—3:10pm

2:50pm—3:10pm

"Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe"

Jiong Gong, Haihao Shen, Guoming Zhang, Xiaoli Liu, Shane Li, Ge Jin, Niharika Maheshwari

Intel Corporation
Xeon Platinum 8124M, AWS, Intel C++ Compiler 17.0.5 20170817, ResNet-50, Inception-V3, SSD, 32-bit, 8-bit, Caffe
Xeon Platinum 8124M, AWS, Intel C++ Compiler 17.0.5 20170817, ResNet-50, Inception-V3, SSD, 32-bit, 8-bit, Caffe

3:10pm—3:30pm

3:10pm—3:30pm

"VTA: Open Hardware/Software Stack for Vertical Deep Learning System Optimization"

Thierry Moreau, Tianqi Chen, Luis Ceze

University of Washington, USA
Xilinx FGPA (Pynq board), ResNet-*, MXNet, NNVM/TVM
Xilinx FGPA (Pynq board), ResNet-*, MXNet, NNVM/TVM

3:30pm—4:00pm

3:30pm—4:00pm

Break

4:00pm—4:20pm

4:00pm—4:20

"Optimizing Deep Learning Workloads on Arm GPU with TVM"

Lianmin Zheng1, Tianqi Chen2

1 Shanghai Jiao Tong University, China
2 University of Washington, USA
Firefly-RK3399, GCC, LLVM, VGG16, MobileNet, ResNet-18, OpenBLAS vs ArmCL, MXNet, NNVM/TVM
Firefly-RK3399, GCC, LLVM, VGG16, MobileNet, ResNet-18, OpenBLAS vs ArmCL, MXNet, NNVM/TVM

4:20pm—4:50pm

4:20pm—4:50pm

"Introducing open ReQuEST platform, scoreboard and long-term vision"

Grigori Fursin and the ReQuEST organizers

"Exploring performance and accuracy of the MobileNets family using the Arm Compute Library"

Nikolay Chunosov, Flavio Vella, Anton Lokhmotov, Grigori Fursin

dividiti, UK
cTuning foundation, France
HiKey 960 (GPU), GCC, MobileNets exploration, ArmCL (18.01,18.02,dividiti optimizations), OpenCL
HiKey 960 (GPU), GCC, MobileNets exploration, ArmCL (18.01,18.02,dividiti optimizations), OpenCL

5:00pm

5:00pm

"Tackling complexity, reproducibility and tech transfer challenges in a rapidly evolving AI/ML/systems research"

Moderators: Grigori Fursin and Thierry Moreau.

"Exploring performance and accuracy of the MobileNets family using the Arm Compute Library"

We plan to center discussion around the following questions:

  • How do we facilitate tech transfer between academia and industry in a quickly evolving research landscape?
  • How do we incentivize companies and academic researchers to release more artifacts and open source projects as portable, customizable and reusable components which can be collaboratively optimized by the community across diverse models, data sets and platforms from the cloud to edge?
  • How do we ensure reproducible evaluation and fair comparison of diverse AI/ML frameworks, libraries, techniques and tools?
  • What other workloads (AI, ML, quantum) and exciting research challenges should ReQuEST attempt to solve in its future iterations with the help of the multi-disciplinary community: reducing training time and costs, comparing specialized hardware (TPU/FPGA/DSP), distributing learning across edge devices, ...

Participants:

Hillery Hunter, IBM

Hillery Hunter is an IBM Fellow and Director of the Accelerated Cognitive Infrastructure group at IBM's T.J. Watson Research Center in Yorktown Heights, NY. She is interested in cross-disciplinary technology topics, spanning silicon to system architecture to achieve new solutions to traditional problems. Her team pursues hardware-software co-optimization to take the wait time out of machine and deep learning problems. Her prior work was in the areas of DRAM main memory systems and embedded DRAM, and she gained development experience serving as IBM's server and mainframe DDR3-generation end-to-end memory power lead. In 2010, she was selected by the National Academy of Engineering for its Frontiers in Engineering Symposium, a recognition as one of the top young engineers in America. Dr. Hunter received the Ph.D. degree in Electrical Engineering from the University of Illinois, Urbana-Champaign and is a member of the IBM Academy of Technology. Hillery was appointed as an IBM Fellow in 2017.


Yiran Chen, Duke University

Yiran Chen received B.S and M.S. from Tsinghua University and Ph.D. from Purdue University in 2005. After five years in industry, he joined University of Pittsburgh in 2010 as Assistant Professor and then promoted to Associate Professor with tenure in 2014, held Bicentennial Alumni Faculty Fellow. He now is a tenured Associate Professor of the Department of Electrical and Computer Engineering at Duke University and serving as the co-director of Duke Center for Evolutionary Intelligence (CEI), focusing on the research of new memory and storage systems, machine learning and neuromorphic computing, and mobile computing systems. Dr. Chen has published one book and more than 300 technical publications and has been granted 93 US patents. He is the associate editor of IEEE TNNLS, IEEE D&T, IEEE ESL, ACM JETC, and ACM TCPS, and served on the technical and organization committees of more than 40 international conferences. He received 6 best paper awards and 12 best paper nominations from international conferences. He is the recipient of NSF CAREER award and ACM SIGDA outstanding new faculty award. He is the Fellow of IEEE.


Charles Qi, Cadence

Charles Qi is a system solutions architect in Cadence's IPG System and Software team, responsible for providing vision system solutions based on the Cadence(R) Tensilica Vision DSP technology and a broad range of interface IP portfolio. At system level, his primary focus is image sensing, computer vision and deep learning hardware and software for high-performance automotive vision ADAS SoC. Currently he is also an active internal architecture team member for high performance neural network acceleration hardware IPs. Prior to joining Cadence, Charles held various technical positions in Intel, Broadcom and several high-tech startups.

Important dates

  • Artifact submissions due: February 12, 2018 AoE
  • Artifact evaluation: February 13-February 21, 2018
  • Author notification: February 22, 2018
  • ASPLOS early registration deadline: February 23, 2018 (See ASPLOS registration and visa support)
  • Workshop with presentations and discussions of winning workflows: March 24, 2018
  • Final papers and artifacts for the Digital Library: mid April, 2018
  • Report to our advisory board: end of April, 2018

Call for submissions

The 1st ReQuEST tournament is co-located with ACM ASPLOS'18 and will focus on optimizing the whole model/software/hardware stack for image classification based on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Unlike the classical ILSVRC where submissions are ranked according to their classification accuracy, however, ReQuEST submissions will be evaluated according to multiple metrics and trade-offs selected by the authors (e.g. accuracy, speed, throughput, energy consumption, hardware cost, usage cost, etc.) in a unified, reproducible and objective way using the Collective Knowledge framework (CK). Restricting the competition to a single application domain will allow us to test our open-source ReQuEST tournament infrastructure, validate it across multiple platforms and environments, and prepare a dedicated live scoreboard with results similar to this open SOTA scoreboard.

We encourage participants to target accessible, off-the-shelf hardware to allow our evaluation committee to conveniently reproduce their results. Example systems include:

  • Server-class: AWS/Azure cloud instance, any x86-based desktop system.
  • Mobile-class: Any Arm-based (e.g. NVIDIA Jetson TX2, Raspberry Pi 3, Xilinx PYNQ board), or Intel-Atom-based SoC development board, Android-based smartphone or tablets.
  • IoT-class: Low-power Arm micro-controllers (e.g. Freescale FRDM KL03 Development Board).

If a submission relies on an exotic hardware platform, the participants can either provide restricted access to their evaluation platform to the artifact evaluation committee, or notify the organizers in advance (please try to give us at least 3 weeks notice) about their choice so that a similar platform can be acquired in time (assuming the cost is not prohibitive).

Example optimizations include:

  • Design space exploration of model topologies, operators, activation functions, configurations.
  • Hyper-parameter search and meta-learning techniques that help optimize accuracy and inference time.
  • Comparison of different deep learning systems (for example TensorFlow vs. Caffe2 vs. CNTK vs. MXNet).
  • Model optimizations that trade accuracy for speed or efficiency (e.g. reduced precision and model compression).
  • Operator-level quantization, binarization or ternarization techniques to improve overall inference time (e.g. binary networks, XNOR nets).
  • Library optimizations targeting deep learning operators on mobile systems (e.g. depthwise convolution).
  • FPGA acceleration that takes advantage of narrow integer bitwidth.
  • Software optimizations targeting GPU-less mobile/IoT systems.

We strongly encourage artifact submissions for already published optimization techniques since one of the ReQuEST goals is to prepare a reference (baseline) set of implementations of various algorithms shared as portable, customizable and reusable CK components with a common API. In fact, the ReQuEST submissions will be directly fed into pilot CK integrations with the ACM Digital Library.

Submission

We follow standard procedures for submitting and evaluating experimental workflows, as established at leading systems conferences including CGO, PPoPP, PACT and SuperComputing ("artifact evaluation"):

  • Step 1: Share your experimental artifacts and workflows

    You should make all artifacts and experimental workflows publicly available via GitHub, GitLab, Bitbucket or similar, or pack them in a zip/tar archive or Docker/VM image. You should also provide instructions and scripts to build and run your workflows on a target platform, measure the characteristics and compare the results against a reference implementation.
    If you are already familiar with the open-source Collective Knowledge framework (CK), you are encouraged to convert your experimental workflows to to portable CK workflows. Such workflows can automatically set up the environment, detect required software dependencies, install missing packages and run experiments, thus automating artifact evaluation. (See some examples here.)
    If you are not familiar with CK, worry not! We will gladly help you convert your submission to CK during the evalution stage.

  • Step 2: Submit an extended abstract with Artifact Appendix

    You should prepare an extended abstract (max 4 pages) using this ReQuEST LaTex template in the SIGPLAN conference style.
    Include your name, affiliation, and a brief description of your work (which can be novel or already presented elsewhere). Please also fill in the Artifact Appendix in the above template, including how to obtain your artifacts and workflows. Provide a detailed specification of your experimental workflow, a list of optimization metrics (speed, accuracy, energy, costs, etc.) and the expected results (which the reviewers will need to independently validate).
    Please submit your extended abstract as a PDF via the ReQuEST HotCRP website. Please contact the organizers if your encounter any problems.

Evaluation

ReQuEST is backed by the ACM Task Force on Data, Software, and Reproducibility in Publication and uses the standard artifact evaluation methodology. Artifact evaluation is single blind (see PPoPP, CGO, PACT, RTSS and SuperComputing). Reviews will be performed by the organizers and volunteers ("reviewers"), and can be made public upon the authors' request (see ADAPT). Quality and efficiency metrics will be collected for each submission, and displayed on a live ReQuEST scoreboard similar to this open CK repository.

  • Step 1: Collaborate on converting your workflows to CK

    If your submission is not in the CK format, we will help you to add a portable CK workflow for your algorithm while reusing available CK packages and modules shared by the community (see CK Getting Started Guides, CK ReQuEST workflow example to explore MobileNets on Arm GPUs, shared CK packages CK software detection plugins, reusable CK modules (unified scripts and tool wrappers) and CK repositories with AI/ML workflows). You may choose how to communicate with us during this step: either privately via HotCRP, semi-privately via a dedicated Slack channel with all authors and reviewers, or, preferably, publicly via CK slack channel or the CK mailing list (thus making the community immediately aware of your artifact).

  • Step 2: Collaborate on validating your results

    We will form a ReQuEST artifact evaluation committee (AEC) from the organizers and volunteers ("reviewers"). The AEC task is to objectively evaluate submissions on appropriate hardware platforms, reproduce results and aggregate them on a multi-objective public scoreboard. AE will be a friendly and interactive process between the authors and the reviewers, with the goal of making the artifacts as useful as possible for the community. For example, the reviewers may encounter some unexpected problems, and ask the authors for help to fix them.
    Again, the authors can communicate with the reviewers privately via HotCRP, semi-privately via Slack, or publicly by opening tickets in shared repositories (see examples 1 and 2) and/or via the CK mailing list. If any of the organizers submit their workflows (mainly to provide reference implementations), their submissions will go through public evaluation.

  • Step 3: Collaborate on visualizing your results on a public scoreboard

    Due to the multi-faceted nature of the competition, submissions will not be ranked according to a single metric (as this often results in over-engineered solutions), but instead the AEC will assess their Pareto optimality on two or more metrics exposed by the authors. As such, there will not be a single winner, but rather better and worse designs based on their relative Pareto optimality (up to 3 design points allowed per each submission). We will collaborate with the authors to correctly visualize the results and SW/HW/model configurations on a public scoreboard while grouping them according to certain categories of their choice (e.g. embedded vs. server). A unique submission may define a category in its own right. To win, the results of an entry will normally lie close to the Pareto-optimal frontier in its category. However, a winning entry can be also praised for its originality, reproducibility, adaptability, scalability, portability, ease of use, etc.

Presentation

  • Step 1: Present at the ReQuEST workshop at ASPLOS'18

    We will announce accepted SW/HW/model configurations at the end of February, and invite the authors to present their work at the 1st ReQuEST workshop co-located with ASPLOS 2018 (ACM conference on Architectural Support for Programming Languages and Operating Systems, which is the premier forum for multidisciplinary systems research spanning computer architecture and hardware, programming languages and compilers, operating systems and networking). This will give the authors an opportunity to share their research and implementation insights with the research community as well as discuss future R&D directions.
    A common academic and industrial panel will be held at the end of the workshop to discuss how to improve the common SW/HW co-design methodology and infrastructure for deep learning and other real-world workloads.

  • Step 2: Publish in the ACM Digital Library

    The authors of the winning submissions will publish their extended abstracts with an Artifact Appendix and related artifacts in the ACM Digital Library (even if their techniques have already been published, since the workshop focuses on validated and reusable artifacts!) Furthermore, we have partnered with ACM to award "available / reusable / replicated" badges to all the winning artifacts. This will make them discoverable via the ACM Digital Library (check this out by selecting "Artifact Badge" for a field and then select any badge you wish in the ACM DL advanced search)!

Advisory/industrial board goal

Members of the REQUEST advisory/industrial board will look over and comment on the results of our tournaments and workshops, collaborate on a common methodology for reproducible evaluation and optimization, suggest realistic workloads for future tournaments, arrange access to rare hardware to Artifact Evaluation Committee, and provide prizes for the most efficient solutions.

Open research goal

REQUEST attempts to put systems researchers, application engineers and end-users on the same ground by providing a common and portable evaluation framework while sharing all artifacts and optimization results in an open and reproducible way. We expect that our open repository with customizable, reusable and optimized artifacts will be useful for

Feel free to contact us if you have questions or suggestions!