Authors: Franziska Horn,Robert Pack,Michael Rieger
ArXiv: 1901.07329
Document:
PDF
DOI
Artifact development version:
GitHub
Abstract URL: https://arxiv.org/abs/1901.07329v4
This paper describes the autofeat Python library, which provides scikit-learn style linear regression and classification models with automated feature engineering and selection capabilities. Complex non-linear machine learning models, such as neural networks, are in practice often difficult to train and even harder to explain to non-statisticians, who require transparent analysis results as a basis for important business decisions. While linear models are efficient and intuitive, they generally provide lower prediction accuracies. Our library provides a multi-step feature engineering and selection process, where first a large pool of non-linear features is generated, from which then a small and robust set of meaningful features is selected, which improve the prediction accuracy of a linear model while retaining its interpretability.