Open library

Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

lib:f6c909338968ac2d (v1.0.0)

Vote to reproduce this paper and share portable workflows ▲ 1 ▼

Authors: Paras Jain,Ajay Jain,Aniruddha Nrusimha,Amir Gholami,Pieter Abbeel,Kurt Keutzer,Ion Stoica,Joseph E. Gonzalez
ArXiv: 1910.02653
Document: PDF DOI

Artifact development version: GitHub

Abstract URL: https://arxiv.org/abs/1910.02653v3

We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies. We introduce Checkmate, a system that solves for optimal rematerialization schedules in reasonable times (under an hour) using off-the-shelf MILP solvers or near-optimal schedules with an approximation algorithm, then uses these schedules to accelerate millions of training iterations. Our method scales to complex, realistic architectures and is hardware-aware through the use of accelerator-specific, profile-based cost models. In addition to reducing training cost, Checkmate enables real-world networks to be trained with up to 5.1x larger input sizes. Checkmate is an open-source project, available at https://github.com/parasj/checkmate.

Relevant initiatives

Related knowledge about this paper

Search on this portal

Reproduced results (crowd-benchmarking and competitions)

Artifact and reproducibility checklists

Common formats for research projects and shared artifacts

Collective Knowledge (organizing research projects based on FAIR principles)

Reproducibility initiatives

Comments

Please log in to add your comments!

If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

Relevant initiatives Hide

Comments Hide

Relevant initiatives

Comments