Authors: Pieter Gijsbers,Joaquin Vanschoren,Randal S. Olson
ArXiv: 1801.06007
Document:
PDF
DOI
Artifact development version:
GitHub
Abstract URL: http://arxiv.org/abs/1801.06007v2
With the demand for machine learning increasing, so does the demand for tools
which make it easier to use. Automated machine learning (AutoML) tools have
been developed to address this need, such as the Tree-Based Pipeline
Optimization Tool (TPOT) which uses genetic programming to build optimal
pipelines. We introduce Layered TPOT, a modification to TPOT which aims to
create pipelines equally good as the original, but in significantly less time.
This approach evaluates candidate pipelines on increasingly large subsets of
the data according to their fitness, using a modified evolutionary algorithm to
allow for separate competition between pipelines trained on different sample
sizes. Empirical evaluation shows that, on sufficiently large datasets, Layered
TPOT indeed finds better models faster.