Extractor-Based Time-Space Lower Bounds for Learning

Sumegha Garg

A matrix M: A x X -> {-1,1} corresponds to the following learning problem: An unknown element x in X is chosen uniformly at random. 
A learner tries to learn x from a stream of samples, (a_1, b_1), (a_2, b_2) ..., where for every i, a_i in A is chosen uniformly at 
random and b_i = M(a_i,x). Assume that k, l, r are such that any submatrix of M of at least 2^{-k}|A| rows and at least 2^{-l}|X| 
columns, has a bias of at most 2^{-r}. We show that any learning algorithm for the learning problem corresponding to M requires 
either a memory of size at least \Omega(kl), or at least 2^{\Omega(r)} samples. 

This result implies all previous memory-samples lower bounds, as well as a number of new applications. In particular, this shows 
that to learn d-degree multilinear polynomial from it's evaluation on points from {0,1}^n, we either need ~\Omega(n.n^d) memory or 
2^{\Omega(n)} samples. Note that we can learn w.h.p. in n.n^d memory by storing the first n^d samples (information theoretically). 

Joint work with Ran Raz and Avishay Tal.