Many machine learning approaches are characterized by information constraints
on how they interact with the training data. These include memory and
sequential access constraints (e.g. fast first-order methods to solve
stochastic optimization problems); communication constraints (e.g. distributed
learning); partial access to the underlying data (e.g. missing features and
multi-armed bandits) and more. However, currently we have little understanding
how such information constraints fundamentally affect our performance,
independent of the learning problem semantics. For example, are there learning
problems where any algorithm which has small memory footprint (or can use any
bounded number of bits from each example, or has certain communication
constraints) will perform worse than what is possible without such constraints?
In this paper, we describe how a single set of results implies positive answers
to the above, for several different settings.