Monday (23/05) Tuesday (24/05) Wednesday (25/05) Thursday (26/05) Friday (27/05) 9:00 Opening remarks   9:10   10:00   10:30 Break   11:00   11:50 9:00   9:50   10:20 Break   10:50   11:40   12:10 9:00   9:50   10:20 Break   10:50   11:20   11:50 Break   12:00   12:30 9:00   9:50   10:20 Break and Group Photo   11:00   11:30   12:00 9:00   9:30   10:00 Break   10:10   10:40   11:10 Break   11:40   12:10   12:25 Closing remarks 1:00 Lunch 1:00 Lunch 1:00 Lunch 1:00 Lunch 1:00 Lunch 3:00   3:50 Break   4:30   5:00 3:00   3:50   4:20 Break   4:50   5:20 2:30 Excursion to Ravenna (with dinner at 7:30pm at Cappello restaurant in Ravenna) Meeting at 2:20 in Vignaiolo Square 3:00   3:30   4:00 Break   4:15 Open Problem Session   5:45 Museum in Bertinoro (guided tour) 7:30  Dinner in town, restaurant EBC 7:30  Dinner in Mensa 7:30  Dinner in town, restaurant LaGrotta

Workshop news on a blog (thanks to Andrew McGregor and his collaborators): (day 1a, 1b, 2a, 2b, 3, 4a, 4b, 5a, 5b).

Workshop on Sublinear Algorithms

Bertinoro International Center for informatics

May 23 - 27, 2011

Abstracts

# Nir Ailon (Technion)

## Johnson-Lindenstrauss Transform(s)



The Johnson-Lindenstrauss technique has been for many years used as an algorithmic black-box for trading off dimensionality (and its curse) with approximation.  More recently there have been attempts to open this black box and improve it as an algorithm in its own right.  Much work has been devoted to reducing both time and randomness.  In this tutorial I will survey work on the time complexity aspects, and connect it with other important fields such as compressed sensing.



# Alexandr Andoni (Microsoft Research Silicon Valley)

## Sublinear Algorithms via Precision Sampling

Suppose we want to estimate a sum of bounded reals a1+a2+…+an with access to only some "limited information" about ai's. A classical setting is where we estimate the entire sum by knowing only a random subset of ai's. Naturally, there is a trade-off between the size of the subset and the resulting approximation.

Motivated by applications where this trade-off is not good enough, we introduce Precision Sampling, which is an estimation technique that uses a more general kind of "limited information" about ai's.  Instead of obtaining a subset of ai's as above, here we obtain a rough estimate for each of the n quantities, up to various prescribed degrees of "precision" (approximation). The trade-off then becomes between the precision of the estimates of ai's and the resulting approximation to the total sum. We show one can obtain a trade-off that is qualitatively better in the precision sampling setting than in the aforementioned (vanilla sampling) setting.

Our resulting tool leads to new sublinear algorithms. In one instance, it gives a simplified algorithm for a class of streaming problems. In another instance, the tool plays an instrumental role in a query-efficient algorithm for estimating edit distance.

Joint work with Robi Krauthgamer and Krzysztof Onak.

# Arnab Bhattacharyya (MIT)

## A Unified Framework for Testing Linear-Invariant Properties

In a sequence of recent papers, Sudan and coauthors have investigated the relation between testability of properties of Boolean functions and the invariance of the properties with respect to transformations of the domain. Linear-invariance is arguably the most common such symmetry for natural properties of Boolean functions on the hypercube. Hence, it is an important goal to find necessary and sufficient conditions for testability of linear-invariant properties. We obtain the following results:

1. We show that every linear-invariant property that can be characterized by forbidding induced solutions to a (possibly infinite) set of linear equations can be tested with one-sided error.

2. We show that every linear-invariant property that can be tested with one-sided error can be characterized by forbidding induced solutions to a (possibly infinite) set of systems of linear equations.

We conjecture that our result from item (1) can be extended to cover systems of linear equations. We further show that the validity of this conjecture would have the following implications:

1. It would imply that every linear-invariant property that is closed under restrictions to linear subspaces is testable with one-sided error. Such a result would unify several previous results on testing Boolean functions, such as the results on testing low-degree polynomials and results on testing Fourier dimensionality.

2. It would imply that a linear-invariant property P is testable with one-sided error *if and only if* P is closed under restrictions to linear subspaces, thus resolving Sudan's problem.

Joint work with Elena Grigorescu and Asaf Shapira.

## Zero-One Frequency Laws

Given a function G, we want to compute a statistic of the form Si in [n] G(mi), where mi are entries of the frequency vector defined by a data stream.

In their fundamental paper, Alon, Matias and Szegedy asked for which functions G it is possible to approximate Si in [n] G(mi) in a single pass over the data and using poly-logarithmic memory. We give a precise characterization (in fact, a zero-one law) for all monotonically increasing functions on frequencies that are zero at the origin. Also, I will introduce the method of recursive sketching that can be used, e.g., for approximating frequency moments.

Joint work with Rafail Ostrovsky.

# Amit Chakrabarti (Dartmouth)

## Gap-Hamming-Distance: the Journey to an Optimal Lower Bound

In the Gap-Hamming-Distance problem (GHD), Alice and Bob receive n-bit strings and must decide whether they are "close" (Hamming distance \leq n/2-n1/2) or "far" (distance \geq n/2+n1/2). How many bits must they communicate to solve this problem, with error at most 1/3? Since being proposed in the early 2000s by Indyk and Woodruff, for its applications to space lower bounds for data stream algorithms, the problem has been extensively studied.

In this talk, I shall describe the journey from the early results of Indyk and Woodruff, to stronger intermediate results obtained through a deeper understanding of the problem, and finally to a recent optimal lower bound due to myself and Oded Regev.  We now know that any GHD protocol requires W(n) communication, which completely settles the problem.

Two simplifications of our proof have appeared within the last two months, and I shall touch upon these as well.

# Graham Cormode (AT&T Research)

## Mergeable Summaries

In dealing with massive distributed data, exact computations may be possible but can still be very expensive to perform. Instead, it is often desirable to look to more lightweight computations that give (guaranteed) approximations. A central approach is the idea of the mergeable summary: a compact data structure that summarizes a data set, which can be merged with others to summarize the union of the data without significant growth in size. This framework enables parallel computation.

Samples and sketches are two examples of well-known, effective mergeable summaries. I'll present recent results on new, compact mergeable summaries for various tasks, such as approximating frequent items, order statistics, and range counts.

Joint work with Pankaj Agarwal, Zengfeng Huang, Jeff Phillips, Zhewei Wei and Ke Yi.

# Artur Czumaj (University of Warwick)

## Planar Graphs:  Random Walks and Bipartiteness Testing

We study the testability of hereditary properties in arbitrary planar graphs. Our main result is a constant time testing of bipartiteness in arbitrary planar graphs. The previous bound for this class of graphs was O(n1/2), and the constant-time testability was only known for planar graphs with bounded degree. Previously used transformations of unbounded-degree sparse graphs into bounded-degree sparse graphs cannot be used to reduce the problem to the testability of bounded-degree planar graphs. Our approach extends to arbitrary minor-free graphs.

Our algorithm is based on random walks. The challenge here is to analyze random walks for a class of graphs that has good separators, i.e., bad expansion. Standard techniques that use a fast convergence to a uniform distribution do not work in this case. Informally, our analysis technique self-reduces the problem of finding an odd length cycle in a multigraph G induced by a collection of cycles to another multigraph G’ induced by a set of shorter odd-length cycles, in such a way that when a random walks finds a cycle in G’ with probability p, then it does so with probability f(p) in G. This reduction is applied until the cycles collapse to selfloops that can be easily detected.

Joint work with Morteza Monemizadeh, Krzysztof Onak, and Christian Sohler.

# Pierre Fraigniaud (LIAFA, Paris)

## Local Distributed Decision



Inspired by sequential complexity theory, we focus on a complexity theory for distributed decision problems. A central theme in distributed network algorithms concerns understanding and coping with the issue of locality. In this context, solving a decision problem requires the processors to independently inspect their local neighborhoods and then collectively decide whether a given global input instance belongs to some specified language. We consider the standard LOCAL model of computation and define LD(t) (for local decision) as the class of decision problems that can be solved in t communication rounds. We first study the question of whether randomization helps in local distributed computing, and to what extent. Specifically, we define the corresponding randomized class BPLD(t,p,q), containing all languages for which there exists a randomized algorithm that runs in t rounds, accepts correct instances with probability at least p and rejects incorrect ones with probability at least q. We show that p2+q=1 is a threshold for the containment of LD(t) in BPLD(t,p,q). More precisely, we show that there exists a language that does not belong to LD(t) for any t=o(n) but does belong to BPLD(0,p,q) for any p,q in (0,1] such that p2+q \leq1. On the other hand, we show that, restricted to hereditary languages, BPLD(t,p,q) = LD(O(t)), for any function t and any p, q in (0,1] such that p2+q>1. In addition, we investigate the impact of non-determinism on local decision, and establish some structural results inspired by classical computational complexity theory. Specifically, we show that non-determinism does help, but that this help is limited, as there exist languages that cannot be decided non-deterministically. Perhaps surprisingly, it turns out that it is the combination of randomization with non-determinism that enables to decide all languages in constant time. Finally, we introduce the notion of local reduction, and establish some completeness results.

Joint work with Amos Korman and David Peleg.



# Oded Goldreich (Weizmann)

## Finding Cycles and Trees in Sublinear Time



We present sublinear-time (randomized) algorithms for finding simple cycles of length at least k≥3 and tree-minors in bounded-degree graphs. The complexity of these algorithms is related to the distance of the graph from being Ck-minor free (resp., free from having the corresponding tree-minor). In particular, if the graph is far (i.e., W(1)-far) from being cycle-free, then the algorithm finds a cycle of polylogarithmic length in time $\tilde{O}(\sqrt{N})$, where N denotes the number of vertices. This time complexity is optimal up to polylogarithmic factors.

The foregoing results are the outcome of our study of the complexity of one-sided error testers in the bounded-degree graphs model. For example, we show that cycle-freeness of N-vertex graphs can be tested with one-sided error within time complexity $\tilde{O}(poly(1/e)\sqrt{N})$. This matches the known W(\sqrt{N}) query lower bound, and contrasts with the fact that any minor-free property admits a two-sided error tester of query complexity that only depends on the proximity parameter e.  For any constant k≥3, we extend this result to testing whether the input graph has a simple cycle of length at least k. On the other hand, for any fixed tree T, we show that T-minor freeness has a one-sided error tester of query complexity that only depends on the proximity parameter e.

Our algorithm for finding cycles in bounded-degree graphs extends to general graphs, where distances are measured with respect to the actual number of edges. Such an extension is not possible with respect to finding tree-minors in $o(\sqrt{N})$ complexity.

This is joint work with Artur Czumaj, Dana Ron, C. Seshadhri, Asaf Shapira, and Christian Sohler.

# Nir Halman (Hebrew University of Jerusalem and MIT)

## FPTASs for Stochastic Optimization Problems



We present a framework for obtaining Fully Polynomial Time Approximation Schemes (FPTASs) for stochastic optimization problems that run in sublinear time. The functions in our model are either convex or monotone and are over discrete domains.

The framework is based on approximating univariate functions by piecewise linear functions and monitoring the propagation of errors.

This talk is based on several joint works with (subsets of) Diego Klabjan (Northwestern), Chung-Lun Li (The Hong Kong Polytechnic University), Jim Orlin (MIT) and David Simchi-Levi (MIT).



# Piotr Indyk (MIT)

## On the Power of Adaptivity in Sparse Recovery



The goal of sparse recovery is to recover a sparse (with few non-zeros) approximation of data from the available information. We consider the problem of recovering the (approximately) best k-sparse approximation x* of an n-dimensional vector x from linear measurements of x. It is known that this task can be accomplished using m=O(k log (n/k)) non-adaptive measurements and that this bound is tight.

In this talk we show that if one is allowed to perform measurements that are adaptive, then the number of measurements can be considerably reduced compared to the non-adaptive case. We also discuss implications of our results to data stream computing and compressive sensing.



# Tali Kaufman (Bar-Ilan University and Weizmann Institute)

## Locally Testable Codes and Expanders



Expanders and Error correcting codes are two celebrated notions, coupled together, e.g. in the seminal work on Expander Codes by Sipser and Spielman that yields some of the best codes. In recent years there is a growing interest in codes admitting algorithms that run in *constant* time. E.g. codes for which it is possible to distinguish in constant time between codewords and vectors far from the code. Such codes are known as locally testable codes (LTCs) and they are intimately related to the PCP Theorem.

It is known that Expander Codes based on *best* expanders can NOT yield LTCs, so it seems that expansion is somewhat contradicting to the notion of LTCs. We show that, nevertheless, the graph associated with an LTC *must* be essentially an expander. Namely, every LTC can be decomposed into a constant number of "basic" codes whose associated graph is an expander.

Based on a joint work with Irit Dinur.



# Robert Krauthgamer (Weizmann Institute)

## Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity



We present a near-linear time algorithm that approximates the edit distance between two strings within a significantly better factor than previously known. This result emerges in our investigation of edit distance from a new perspective, namely a model of asymmetric queries, for which we present near-tight bounds.

Another consequence of this new model is the first rigorous separation between edit distance and Ulam distance, by tracing the hardness of edit distance to phenomena that were not used by previous analyses.

Joint work with Alexandr Andoni and Krzysztof Onak.



# Oded Lachish (Birkbeck University of London)

## Testing a Language Accepted by a Fixed Boolean Formula

For a given formula p we define SAT(p) to be the set of all assignments that satisfy p. We study the query complexity of testing and estimating SAT(p), in a model where the testers have absolute knowledge of p. Our main results are.

·         Testing

o If SAT(p) is binary and read-once, then the query complexity is at most quasipolynomial in 1/e

o If SAT(p) is binary, monotone and read-k-times then the query complexity is at most quasipolynomial in 1/e  and k

·         Estimating

o If SAT(p) is binary, monotone, read-once, then the query complexity is at most quasipolynomial in 1/e

# Michael W. Mahoney (Stanford University)

## Fast Approximation of Matrix Coherence

The concept of matrix coherence measures the extent to which the singular vectors of a matrix are correlated with the standard basis, and as such it is of interest in the recently-popular problem of matrix completion. A more refined concept is that of statistical leverage. Statistical leverage is usually quantified by the diagonal elements of the projection matrix onto the best rank-k approximation to a matrix. Thus, it is useful in large-scale diagnostic data analysis for identifying outliers in large data matrices and data graphs. Moreover, it is the key structural nonuniformity that must be dealt with (i.e., either rapidly approximated or rapidly uniformized at the preprocessing step) in developing fast random sampling and random projection algorithms for matrix problems such as least-squares regression and low-rank matrix approximation.

As a corollary of our main result, we obtain an algorithm to approximate the coherence of an arbitrary matrix in time qualitatively faster than the naive algorithm. Our main result is a randomized algorithm that takes as input an arbitrary m x n matrix A and a rank parameter k; it returns as out output a relative-error approximation to every diagonal element of the projection matrix onto the best rank-k approximation to A; and it runs in time qualitatively faster than the time to compute a basis for that space.
In particular, given an m x n matrix A, with m>>n, the algorithm returns relative-error approximations to all the diagonal elements of the projection matrix onto the left singular subspace in roughly O(m n log n) time, as opposed to O(mn2) time required by the naive algorithm. In addition, numerical implementations run faster than traditional deterministic algorithms for matrices as small as hundreds by thousands.
Our analysis boils down to computing a relative-error approximation to an underconstrained least-squares approximation, or relatedly it can be viewed as an application of Johnson-Lindenstrauss ideas.

Blackboard talk,         no slides.

# Andrew McGregor (University of Massachusetts, Amherst)

## Data Streams, Dyck Languages, and Detecting Dubious Data Structures

In this talk, we present algorithms and lower bounds for recognizing various languages in the data stream model.  In particular, we resolve an open problem of Magniez, Mathieu and Nayak [STOC, 2010] concerning the multi-pass complexity of recognizing Dyck languages. This results in a natural separation between the standard multi-pass model and the multi-pass model that permits reverse passes. We also present the first passive memory checkers that verify the interaction transcripts of priority queues, stacks, and double-ended queues. We obtain tight upper and lower bounds for these problems, thereby addressing an important sub-class of the memory checking framework of Blum et al. [Algorithmica, 1994]. A key contribution of our work is a new bound on the information complexity of AUGMENTED-INDEX.

Includes joint work with Amit Chakrabarti, Graham Cormode, and Ranganath Kondapally.

# Jelani Nelson (MIT)

## Sparse Johnson-Lindenstrauss Transforms

The Johnson-Lindenstrauss (JL) lemma (1984) states that any n points in d-dimensional Euclidean space can be embedded into k=O(log n/e2) dimensions so that all pairwise distances are preserved up to 1+/-e. Furthermore, this embedding can be achieved via a linear mapping. The JL lemma is a useful tool for speeding up solutions to several high-dimensional problems: closest pair, nearest neighbor, diameter, minimum spanning tree, etc.  It also speeds up some clustering and string processing algorithms, reduces the amount of storage required to store a dataset, and can be used to reduce memory required for numerical linear algebra problems such as linear regression and low rank approximation.

The original proofs of the JL lemma let the linear mapping be specified by a random dense k x d matrix (e.g. i.i.d. Gaussian entries).  Thus, performing an embedding requires dense matrix-vector multiplication.  There has been much recent work on developing distributions that allow for embedding vectors quickly, begun by the work of [Ailon, Chazelle, STOC 2006]. Unfortunately, these works cannot take advantage of sparsity of the vector to be embedded, and they take W(d) time to embed a vector even with only one non-zero entry. This feature is particularly unfortunate in streaming applications, where updates can be seen as 1-sparse vectors.

One way to speed up embeddings for sparse vectors is to develop distributions over linear mappings whose corresponding matrices themselves are sparse.  In this talk, we give two JL distributions with the sparsest known matrices.  In fact, these are the first distributions where every matrix in their support has only o(1) of its entries non-zero for all settings of e and n (specifically, only an O(e)-fraction of entries in each column are non-zero).

This is based on joint work with Daniel Kane (Harvard University).

# Krzysztof Onak (CMU)

## A Near-Optimal Sublinear-Time Algorithm for Approximating the Minimum Vertex Cover Size



I will present a near-optimal sublinear-time algorithm for approximating the size of a minimum vertex cover. Let VC denote this quantity for the input graph. Our algorithm computes an estimate alpha such that VC ≤ a ≤ 2VC + en, where e is a parameter to the algorithm. The query complexity and running time of the algorithm are \tilde{O}(d’ poly(1/e)), where d’ denotes the average vertex degree in the graph.

Joint work with Dana Ron, Michal Rosen, and Ronitt Rubinfeld.



# Ely Porat (Bar-Ilan University)

## Multi-pattern Search in the Streaming Model



In this talk I presented a simple way to do streaming pattern matching in real time (O(1) time) while using O(log(1/d) log m) bits space in the worst case and O(log*S m log(1/d)) bit space in the average/smooth case.

I also present a way to extend this algorithm to efficient multi pattern search (dictionary matching).

# Sofya Raskhodnikova (Pennsylvania State University)

## Testing and Reconstruction of Lipschitz Functions

A function f : D -> R has Lipschitz constant c if dR(f(x), f(y)) ≤ c dD(x, y) for all x, y in D, where dR and dD denote the distance functions on the range and domain of f, respectively. We say a function is Lipschitz if it has Lipschitz constant 1. (Note that rescaling by a factor of 1/c converts a function with a Lipschitz constant c into a Lipschitz function.) In other words, Lipschitz functions are not very sensitive to small changes in the input. We initiate the study of testing and local reconstruction of the Lipschitz property of functions.

We consider functions over domains {0,1}d, {1,...,n} and {1,...,n}d, equipped with L1 distance. We design efficient testers of the Lipschitz property for functions of the form f:{0,1}d -> d Z, where d is in (0,1] and dZ is the set of multiples of d, and also of the form f: {1,...,n} -> R, where R is (discretely) metrically convex. In the first case, the tester runs in time O(d min{d,r}/de), where r is the diameter of the image of f; in the second, in time O((\log n)/e). We give corresponding lower bounds of W(d) and W(log n) on the query complexity  (in the second case, only for nonadaptive 1-sided error testers). Our lower bound for functions over {0,1}d is tight for the case of the {0,1,2} range and constant e.

We also present a local filter of the Lipschitz property for functions of the form f:{1,...,n}d -> R with lookup complexity O((log n+1)d). For functions of the form {0,1}d, we show that every nonadaptive local filter has lookup complexity exponential in d. The testers that we developed have applications to programs analysis. The reconstructors have applications to data privacy.

# Ben Recht (University of Wisconsin)

## From Compressed Sensing to Matrix Completion and Beyond



Matrix completion—where one seeks to recover a low rank matrix from an incomplete set of observations— is a recurring problem in collaborative filtering, dimensionality reduction, and systems and control theory.  While the general problem of finding the lowest rank matrix satisfying a set of equality constraints is NP-hard, a wave of recent results have provided very general settings under which one can perfectly recover all of the missing entries of a low-rank matrix by solving convex optimization problems.  In this talk, I will review this emerging body of work, discussing both theoretical foundations and practical applications and drawing connections with fast Monte Carlo algorithms for matrix approximation pioneered by the TCS community.  I will conclude the talk with some recent work on further extending the catalog of objects and structures that can be recovered from partial information using convex optimization.



# Justin Romberg (Georgia Tech)

## Compressive Sensing in Signal Processing

We will overview the fundamental results and recent theoretical trends in compressive sensing, and review some recent hardware designs which operate within the CS framework.  The theoretical portion of the talk will include reconstruction guarantees for structured random matrices (e.g. sampling using random convolution or random modulation) and recent progress on sampling ensembles of correlated signals using low-rank recovery algorithms.  The applications will include single-pixel imaging, radar imaging, acoustic source localization, and accelerated forward modeling for seismic imaging.

# Dana Ron (Tel Aviv University)

## Sublinear Algorithms for Approximating Graph Parameters

This talk is a survey of sublinear algorithms for approximating various graph parameters. In particular, the parameters surveyed are:

(1) The average degree the number of small stars.

(2) The weight of a minimum spanning tree.

(3) The size of a minimum vertex cover and a maximum matching.

(4) The number of edges that should be added/removed in order to obtain various properties.

# Ronitt Rubinfeld (Tel Aviv University and MIT)

## Testing Properties of Collections of Distributions

We propose a framework for studying property testing of collections of distributions, where the number of distributions in the collection is a parameter of the problem. Previous work on property testing of distributions considered single distributions or pairs of distributions. We suggest two models that differ in the way the algorithm is given access to samples from the distributions. In one model the algorithm may ask for a sample from any distribution of its choice, and in the other the choice of the distribution is random.

Our main focus is on the basic problem of distinguishing between the case that all the distributions in the collection are the same (or very similar), and the case that it is necessary to modify the distributions in the collection in a non-negligible manner so as to obtain this property. We give almost tight upper and lower bounds for this testing problem, as well as study an extension to a clusterability property. One of our lower bounds directly implies a lower bound on testing independence of a joint distribution, a result which was left open by previous work.

Joint work with Reut Levi and Dana Ron.

# Rocco Servedio (Columbia University)

## Learning and Testing k-Modal Distributions



A k-modal probability distribution over the domain {1,…,N} is one whose histogram has at most k "peaks" and "valleys". Such distributions are a natural generalization of the well-studied class of monotone increasing (or monotone decreasing) probability distributions.

We study the problem of learning an unknown k-modal distribution from samples. We also study a related problem in property testing: given access to samples from an unknown k-modal distribution p, determine whether p is identical to q or p is "far" from q. Here q is a k-modal distribution which may be given explicitly or may be available via sample access. These problems were previously studied in the case k=0 (monotone distributions) and k=1 (unimodal distributions).

We give algorithms for these problems that are provably close to the best possible in terms of sample and time complexity. An interesting feature of our approach is that our learning algorithms use ingredients from property testing and vice versa.

Joint work with Costis Daskalakis and Ilias Diakonikolas.



# C. Seshadhri (Sandia National Laboratories)

## Estimating the Longest Increasing Subsequence in Polylogarithmic Time

Finding the length of the longest increasing subsequence (LIS) is a classic algorithmic problem. A small example should explain the problem. In the array 1 9 4 10 6 7, the LIS has length 4 and is 1 4 6 7.

Let n denote the size of the input array. Simple textbook solutions achieve an O(n log n) running time, using dynamic programming. What can a sublinear time algorithm say about the LIS? For any constant d > 0, we construct a polylogarithmic time randomized algorithm that estimates the length of the LIS within an additive error of dn. Previously, the best known polylogarithmic time algorithms could only achieve an additive n/2 approximation.

Why is this problem challenging? The LIS length is the output of a dynamic program, which means unless we solve many (read linear) subproblems, we cannot get the exact LIS length. We are looking to get extremely accurate (read dn) approximations in polylogarithmic time. The algorithm we construct attempts to follow the progress of a (top-down) dynamic program. In polylogarithmic time, it zeroes in on the "important" sub-problems to solve. In essence, it is a sublinear-time divide and conquer algorithm. This requires some new sublinear algorithm ideas, which we hope will have applications for approximating other dynamic programs.

Joint work with Michael Saks.

# Asaf Shapira (Georgia Tech)

## Testing Odd-Cycle-Freeness in Boolean Functions



Call a boolean function f Odd-Cycle-Free if for every odd integer k, and for every k points x1,…,xk in the support of f the sum x1+…+xk does not vanish. We show that the property of being odd-cycle-free is testable with poly(1/e) queries. We prove this by analyzing the eigenvalues of the Cayley graph associated with a Boolean function.

We also prove that there is a canonical tester for odd-cycle-freeness making poly(1/e) queries, meaning that the testing algorithm operates by picking a random linear subspace of dimension O(log(1/e)) and then checking if the restriction of the function to the subspace is odd-cycle-free or not. The test is analyzed by studying the effect of random subspace restriction on the Fourier coefficients of a function. Our work implies that using a canonical tester, instead of an arbitrary tester, to test odd-cycle-freeness results in no more than a polynomial blowup in the query complexity. The question of whether a canonical tester with polynomial blowup exists for all linear-invariant properties remains an open problem.

Joint work with A. Bhattacharyya, E. Grigorescu, and P. Raghavendra.



# Christian Sohler (TU Dortmund)

## Every Property of Hyperfinite Graphs is Testable



A property testing algorithm for a property P in the bounded degree graph model [GR97] is an algorithm that, given access to the adjacency list representation of a graph G=(V,E) with maximum degree at most d, accepts G with probability at least 2/3 if G has property P, and rejects G with probability at least 2/3, if it differs on more than edn edges from every d-degree bounded graph with property P. A property is testable, if for every e, d, and n, there is a property testing algorithm Ae,n,d that makes at most q(e,d) queries to an input graph of n vertices, that is, a non-uniform algorithm that makes a number of queries that is independent of the graph size.

A k-disc around a vertex v of a graph G=(V,E) the subgraph induced by all vertices of distance at most k from v. We show that the structure of a planar graph on large enough number of vertices, n, and with constant maximum degree d, is determined, up to the modification (insertion or deletion) of at most edn edges, by the frequency of k-discs for certain k=k(e,d), that is independent of the size of the graph. We can replace planar graphs by any hyperfinite class of graphs, which includes, for example, every graph class that does not contain a set of forbidden minors.

We use this result to obtain new results and improve upon existing results in the area of property testing. In particular, we prove that

·         graph isomorphism is testable for every class of hyperfinite graphs,

·         every graph property is testable for every class of hyperfinite graphs,

·         every property of hyperfinite graphs is testable in the bounded degree graph model,

·         a large class of graph parameters is approximable for hyperfinite graphs.

Our results also give a partial explanation of the success of motifs in the analysis of complex networks.

Joint work with Ilan Newman.

# Madhu Sudan (Microsoft Research New England)

## Invariance in Property Testing

In this talk we report progress on a line of work that attempts to abstract a large class of results in testing of algebraic properties'' by their invariance. Specifically, we consider properties of functions mapping a large (but finite) field K to a small field F that are invariant under (the roughly K^2) affine transformations of the domain. In this talk we will give some motivation for studying this abstraction, as well as some of the results.

Based on works of/with Eli Ben-Sasson, Elena Grigorescu, Tali Kaufman, Shachar Lovett, Ghid Maatouk, Amir Shpilka.

or in pdf format:

# Gilad Tsur (Tel Aviv University)

## On Approximating the Number of Relevant Variables in a Function

In this work we consider the problem of approximating the number of relevant variables in a function given query access to the function. Since obtaining a multiplicative factor approximation is hard in general, we consider several relaxations of the problem. In particular, we consider relaxations in which we have a promise that the function belongs to a certain family of functions (e.g., linear functions), and we consider a relaxation of the property testing variant of the problem. In the latter relaxation the task is to distinguish between the case that the number of relevant variables is at most k, and the case in which it is far from any function in which the number of relevant variable is more than (1+ g)k for a parameter g.

We give both upper bounds and almost matching lower bounds for the relaxations we study.

This is joint work with Dana Ron.

# Paul Valiant (Berkeley)

## Estimating the Unseen: Sublinear Statistics

Much of statistics can be described as: given samples from an unknown distribution, estimate some property of the distribution. Many natural estimators from statistics require a number of samples that is linear in a natural parameter of the problem, in particular, often, a bound on the support size of the distribution. In this talk I will survey SUBlinear approaches to these problems, starting with the Good-Turing estimator for the fraction of unseen probability mass and Fisher's estimate of the unseen number of species, both from the 1940s, and covering up through the latest results on estimating entropy, independence, and closeness of distributions. Both algorithmic results and lower bounds will be discussed, with a focus on common threads and unifying intuition.

# Roger Wattenhofer (ETH Zürich)

## Distributed Algorithms

In my tutorial I will present some of the cornerstone results of 30 years of research in distributed (network) algorithms, using the maximal independent set (MIS) as a running example. After explaining the MIS problem, and discussing a naïve deterministic distributed algorithm to introduce the model, I present a new proof that the classic randomized distributed algorithm computes an MIS in O(log n) time, where n is the number of nodes. Then I sketch an W(log*n) lower bound for the list graph. In addition, I will mention a few other results, e.g. how to compute a MIS in asymptotically optimal O(log*n) time in bounded growth graphs. Finally I will present the asymptotically optimal logarithmic lower bound by Kuhn et al. for minimum vertex cover (MVC), and mention reductions to other problems including MIS. Throughout the talk I will mention several open problems, and connections to other areas.

## Valutazione delle Norme con le Applicazioni (A Survey of Norm Estimation with Applications)

The data stream model is an abstraction for measuring algorithm efficiency in the presence of massive amounts of data presented in an online fashion. In this model, there is an underlying object receiving updates to its components, such as a vector receiving updates to its coordinates, a graph receiving updates to its edges, or a matrix receiving updates to its entries. A common primitive task in applications in this setting is that of estimating a norm of the object with significantly sublinear (in the object's size) memory and very fast time to update the data structure approximately representing the object at any point in time. I will cover known algorithms and lower bounds for estimating lp-norms, mixed or cascaded norms, and operator norms. I will also cover algorithms for sampling components of an object based on their contribution to the overall norm (e.g., "lp-sampling"). Interestingly, the same data structure, with a different setting of parameters, can be used for many of these problems, and in fact for estimating a much larger class of norms. I will discuss a variety of open questions.

# Ning Xie (MIT)

## Local Computation Algorithms



For input x, let F(x) denote the set of outputs that are the “legal” answers for a computational problem F. Suppose x and members of F(x) are so large that there is not time to read them in their entirety. We propose a model of local computation algorithms which for a given input x, support queries by a user to values of specified locations yi in a legal output y in F(x). When more than one legal output y exists for a given x, the local computation algorithm should output in a way that is consistent with at least one such y. Local computation algorithms are intended to distill the common features of several concepts that have appeared in various algorithmic subfields, including local distributed computation, local algorithms, locally decodable codes, and local reconstruction.

We develop a technique, based on known constructions of small sample spaces of k-wise independent random variables and Beck's analysis in his algorithmic approach to the Lovasz Local Lemma, which under certain conditions can be applied to construct local computation algorithms that run in polylogarithmic time and space. We apply this technique to maximal independent set computations, scheduling radio network broadcasts, hypergraph coloring and satisfying k-SAT formulas.

Based on joint works with Noga Alon, Ronitt Rubinfeld, Gil Tamir and Shai Vardi.



# Yuichi Yoshida (Kyoto University)

## Optimal Constant-Time Approximation Algorithms and (Unconditional) Inapproximability Results for Every Bounded-Degree CSP

Raghavendra (STOC 2008) gave an elegant and surprising result: if Khot’s Unique Games Conjecture (STOC 2002) is true, then for every constraint satisfaction problem (CSP), the best approximation ratio is attained by a certain simple semidefinite programming and a rounding scheme for it.

In this talk, we show that similar results hold for constant- time approximation algorithms in the bounded-degree model. Specifically, we present the following: (i) For every CSP, we construct an oracle that serves an access, in constant time, to a nearly optimal solution to a basic LP relaxation of the CSP. (ii) Using the oracle, we give a constant-time rounding scheme that achieves an approximation ratio coincident with the integrality gap of the basic LP. (iii) Finally, we give a generic conversion from integrality gaps of basic LPs to hardness results. All of those results are unconditional. Therefore, for every bounded-degree CSP, we give the best constant-time approximation algorithm among all.

A CSP instance is called ε-far from satisfiability if we must remove at least an ε-fraction of constraints to make it satisfiable. A CSP is called testable if there is a constant- time algorithm that distinguishes satisfiable instances from ε-far instances with probability at least 2/3. Using the results above, we also derive, under a technical assumption, an equivalent condition under which a CSP is testable in the bounded-degree model.

or in the keynote format:

## Non-speakers:

·         Noga Alon (Tel Aviv University)

·         Sariel Har-Peled (University of Illinois at Urbana-Champaign)