Title:
Universal Learning without the Law of Large Numbers
Abstract:
Many of the papers in the present literature on statistical learning
theory begin by stating some convenient assumptions about the process
generating the data, and then construct a learning method and prove
that it learns (is consistent), or has some other desirable property,
under the stated assumptions. By far the most popular such assumption
is that the data are independent and identically distributed (iid),
though some works relax this to a small extent (e.g., mixing or
ergodic conditions).
In this presentation, we essentially invert this perspective. That
is, beginning with the intention to construct a method that learns, we
ask the question, "Under what types of processes is learning
possible?" We specifically ask this question in regards to universal
learning: the ability to learn any target function. To answer this
question, we state a simple necessary and sufficient condition for a
process to admit universal learning.
Interestingly, unlike the relaxations of the iid assumption commonly
used in the literature, the general family of processes that admit
learning contains some members radically different from those
satisfying the iid assumption, even to the extent that they do not
satisfy a law of large numbers. As existing statistical learning
methods rely heavily on the law of large numbers for their consistency
guarantees, proving that learning is possible even in these scenarios
requires us to develop entirely new approaches to the design of
learning methods.

