Results of the 1st reproducible ACM ReQuEST-ASPLOS'18 tournament:
Developing efficient software and hardware for emerging workloads and optimizing it in terms of speed, accuracy, costs and other metrics is extremely complex and time consuming. Furthermore, the lack of common infrastructure and rigorous methodology for reproducible evaluation and multi-objective optimization makes it even more challenging to validate and compare different published works across numerous and continuously changing platforms, software frameworks, compilers, libraries, algorithms, data sets, and environments.
ReQuEST
is aimed at providing a scalable tournament framework,
a common experimental methodology and an open repository
for continuous evaluation and optimization of the
quality vs. efficiency Pareto optimality of a wide range
of real-world applications, libraries and models across
the whole hardware/software stack on complete platforms.
In contrast with other (deep learning) benchmarking
challenges where experimental results are submitted in
a form of JSON, CSV or XLS files, ReQuESTparticipants
will be asked to submit a complete workflow
artifact in a unified and automated form (i.e. not
just some ad-hoc Docker/VM image) which encompasses
toolchains, frameworks, algorithm, libraries, and target
hardware platform; any of which can be fine-tuned,
or customized at will by the participant to implement
their optimization technique.
Such open
infrastructure helps to bring together multidisciplinary
researchers in systems, compilers, architecture and
machine learning to develop and share their algorithms,
tools and techniques as portable, customizable and
"plug&play" components with a common API.
We then arrange open REQUEST competitions on Pareto-efficient co-design of the whole software/hardware stack to continuously optimize such algorithms in terms of speed, accuracy, energy, costs and other metrics across diverse inputs and platforms from IoT to supercomputers.
All benchmarking results and winning SW/HW/model configurations will be visualized on a public SOTA leaderboard and grouped according to certain categories (e.g.embedded vs. server). The winning artifacts will be discoverable via ACM Digital Library to help the community reproduce, reuse, improve and compare against them thanks to the common experimental framework.
We hope that our approach will help automate research and accelerate innovation!
REQUEST promotes reproducibility of experimental results and reusability/customization of research artifacts by standardizing evaluation methodologies and facilitating the deployment of efficient solutions on heterogeneous platforms. That is why we build our competition on top of an open-source and portable workflow framework (Collective Knowledge or CK) and a standard artifact evaluation methodology from premier ACM systems conferences (CGO, PPoPP, PACT, SuperComputing) to provide unified evaluation and a real-time leader-board of submissions.
REQUEST promotes quality-awareness to the architecture and systems community, and resource-awareness to the applications community and end-users.
The submissions and their evaluation metrics will
be maintained in a public repository that includes
a live leader board. Specific attention will be brought
to submissions close to a Pareto frontier in
a multi-dimensional space of accuracy, execution time,
power/energy consumption, hardware/code/model footprint,
monetary costs etc.
In the long term, REQUEST will cover a comprehensive suite of workloads, datasets and models covering applications domains that are most relevant to researchers in academia and industry (AI, vision, robotics, quantum computing, scientific computing, etc). This suite will evolve according to feedback and contributions from the community thus substituting ad-hoc, artificial, quickly outdated or non-representative benchmarks. Furthermore, all artifacts from this suite can be automatically plugged in to the REQUEST competition workflows to simplify, automate and accelerate systems research.
For the first iteration of REQUEST at ASPLOS'18, we focus on Deep Learning. Our first step is to provide coverage for the ImageNet image classification challenge. Restricting the competition to a single application domain will allow us to prepare an open-source tournament infrastructure and validate it across multiple hardware platforms, deep learning frameworks, libraries, models and inputs. For future incarnations of ReQuEST, we will provide broader application coverage, based on the interests of the research community and the direction set by our industrial board.
Though our main focus is on end-to-end applications, we also plan to allow future submissions for (micro)kernels such as matrix multiply, convolutions and transfer functions to facilitate participation from the compilers and computer architecture community.
REQUEST aims at covering a comprehensive set of hardware systems from data-centers down to sensory nodes, incorporating various forms of processors including GPUs, DSPs, FPGAs, neuromorphic and even analogue accelerators in the long term.
In general, we want to encourage participants to target accessible, off-the-shelf hardware to allow our artifact evaluation committee to conveniently reproduce their results. Example systems include:
If a submission relies on an exotic hardware platform, the participants can either provide restricted access to their evaluation platform to the artifact evaluation committee, or at least notify in advance (at least 3 weeks notice) the organizers of their choice so that a similar platform can be acquired in time (assuming costs are not prohibitive).
In the longer term, we also plan to provide support for simulator-based evaluations for architecture/micro-architecture research.
Authors need to submit a short document briefly describing their novel optimization technique or referencing already published paper, and providing a detailed specification of the experimental workflow including all related artifacts, evaluation methodology, and improved metrics to compete with other submissions.
Note that novelty of the implemented techniques is not a requirement! We actually strongly encourage artifact submissions of already published techniques for which artifacts don't exist yet. We will independently reproduce them to prepare an open set of reference implementations of popular algorithms/frameworks/optimizations in a form of portable and customizable workflows which can be easily reused and build upon!
We want to unify every submission to enable fair evaluation. That is why we decided to use the open-source Collective Knowledge workflow framework (CK). CK helps the community share artifacts (models, data sets, libraries, tools) as reusable and customizable components with a common JSON API and meta description. CK also helps to implement portable workflows which can adapt to a user environment on Linux, Windows, MacOS and Android. ACM currently evaluates CK to enabling sharing of reusable and portable artifacts in an ACM Digital Library.
Non-profit cTuning foundation will help authors convert their artifacts and experimental scripts to the CK format during evaluation while reusing AI artifacts already shared by the community in the CK format (see CK AI repositories, CK modules (wrappers), CK software detection plugins, portable CK packages). Authors can also try to convert their workflows to the CK format themselves using the distinguished artifact from ACM CGO'17 as an example (see Artifact repository at GitHub, Artifact Appendix, CK notes, CK portable workflows) though the learning curve is still quite steep - we plan to prepare CK tutorials based on feeback from the participants.
REQUEST is backed by the ACM Task Force on Data, Software, and Reproducibility in Publication and will use the standard ACM artifact evaluation methodology. Artifact evaluation will be single blind (see PPoPP, CGO, PACT, RTSS and SuperComputing), and the reviews can be made public (see ADAPT) upon the authors' request. Quality and efficiency metrics will be collected for each submission, and compiled on the REQUEST live scoreboard.
REQUEST will not determine a single winner, as collapsing all of the metrics into one single metric across all platforms will result in over-engineered solutions. Instead, each REQUEST tournament will expose a set of quality, performance and efficiency metrics to perform optimizations on.
We will organize REQUEST workshops associated with tournaments to let authors present and discuss their most efficient algorithms. We will also use workshops as an open forum to discuss how to improve our common reproducible methodology and framework for SW/HW co-design of emerging workloads with a broad academic and industrial community!
Solutions do not have to be on the Pareto frontier to be accepted for such workshops and the open REQUEST repository - a submission can be praised for its originality, reproducibility, adaptability, scalability, portability, ease of use, etc.
However, reproducible submissions on the Pareto frontier will have an option to be published in the ACM Digital Library with ACM available, reusable and replicated badges. This will make them discoverable via ACM DL search engine — you can check this new feature yourself (since 2018) by selecting "Artifact Badge" for field and then select any badge you wish in the ACM DL advanced search!
Members of the REQUEST advisory/industrial board will look over and comment on the results of our tournaments and workshops, collaborate on a common methodology for reproducible evaluation and optimization, suggest realistic workloads for future tournaments, arrange access to rare hardware to Artifact Evaluation Committee, and provide prizes for the most efficient solutions.
REQUEST attempts to put systems researchers, application engineers and end-users on the same ground by providing a common and portable evaluation framework while sharing all artifacts and optimization results in an open and reproducible way. We expect that our open repository with customizable, reusable and optimized artifacts will be useful for
Feel free to contact us if you have questions or suggestions!