# Talk Titles and Abstracts

### Monday 19th Morning

Greg Valiant (Stanford): Estimating Learnability: Approximating signal and noise in the sublinear data regime

We consider the problem of estimating how well a model class is capable of fitting a distribution of labeled data. We show that it is often possible to accurately estimate this “learnability” even when given an amount of data that is too small to reliably learn any model that fits the distribution. Our first result applies to the setting where the data is drawn from a d-dimensional distribution with isotropic covariance, and the label of each datapoint is a linear function of the data plus independent noise of unknown variance. In this setting, we show that the magnitude of the noise can be accurately approximated even given sqrt(d) samples, which is information theoretically optimal. Note that even if there is no noise, a sample size linear in the dimension, d, is required to learn any function correlated with the underlying linear model. We then extend our estimation approach to the setting where the data distribution has an (unknown) arbitrary covariance matrix, allowing these techniques to be applied to settings where the model class consists of a linear function applied to a nonlinear embedding of the data. Finally, we demonstrate the practical viability of these approaches on synthetic and real data.

David Woodruff (CMU): Sublinear Time Low Rank Approximation of Positive Semidefinite Matrices

We show how to compute a relative-error low-rank approximation to any positive semidefinite (PSD) matrix in sublinear time, i.e., for any $n \times n$ PSD matrix $A$, in $\tilde O(n \cdot \poly(k/\epsilon))$ time we output a rank-$k$ matrix $B$, in factored form, for which $\|A-B\|_F^2 \leq (1+\epsilon)\|A-A_k\|_F^2$, where $A_k$ is the best rank-$k$ approximation to $A$. When $k$ and $1/\epsilon$ are not too large compared to the sparsity of $A$, our algorithm does not need to read all entries of the matrix. Hence, we significantly improve upon previous $\nnz(A)$ time algorithms based on oblivious subspace embeddings, and bypass an $\nnz(A)$ time lower bound for general matrices (where $\nnz(A)$ denotes the number of non-zero entries in the matrix). We prove time lower bounds for low-rank approximation of PSD matrices, showing that our algorithm is close to optimal. Finally, we extend our techniques to give sublinear time algorithms for low-rank approximation of $A$ in the (often stronger) spectral norm metric $\norm{A-B}_2^2$ and for ridge regression on PSD matrices. Joint work with Cameron Musco.

Jelani Nelson (Harvard): BPTree; improved space for insertion-only l_2 heavy hitters

The CountSketch of (Charikar, Chen, Farach-Colton ICALP'02) finds the approximate top k elements in a stream (in an l_2 sense) using O(k log n) words of memory, where all items come from a universe of size n. We give a new data structure, the BPTree, using only O(k log k) words of memory to solve the same task. Our approach is based on l_2 tracking (keep track of the l_2 norm at all points in the stream and not just being able to query l_2 at the stream's "end"), which we in turn tackle using tools from the study of suprema of stochastic processes. Joint work with Vladimir Braverman, Stephen R. Chestnut, Nikita Ivkin, Zhengyu Wang, and David P. Woodruff.

Krzysztof Onak (IBM Research): Communication-efficient distributed learning of discrete probability distributions

### Monday 19th Afternoon

Tugkan Batu (LSE): Generalized Uniformity Testing

In this work, we revisit the problem of uniformity testing of discrete probability distributions. A fundamental problem in distribution testing, testing uniformity over a known domain has been addressed over a significant line of works, and is by now fully understood. The complexity of deciding whether an unknown distribution is uniform over its unknown (and arbitrary) support, however, is much less clear. Yet, this task arises as soon as no prior knowledge on the domain is available, or whenever the samples originate from an unknown and unstructured universe. In this work, we introduce and study this generalized uniformity testing question, and establish nearly tight upper and lower bound showing that -- quite surprisingly -- its sample complexity significantly differs from the known-domain case. Moreover, our algorithm is intrinsically adaptive, in contrast to the overwhelming majority of known distribution testing algorithms.

Clément Canonne (Stanford): When Fourier SIIRVs: Fourier-Based Testing for Families of Distributions

We study the general problem of testing whether an unknown discrete probability distribution belongs to a specified family of distributions. More specifically, given a distribution family $\mathcal{P}$ and sample access to an unknown discrete distribution $p$, we want to distinguish (with high probability) between the case that $p$ is in $\mathcal{P}$ and the case that $p$ is $\varepsilon$-far, in total variation distance, from every distribution in $\mathcal{P}$. This is the archetypal hypothesis testing problem that has received significant attention in statistics and, over the past decade and a half, in theoretical computer science. Of course, the sample complexity of this general inference task depends on the underlying family $\mathcal{P}$. The gold standard in distribution testing is to design sample-optimal and computationally efficient algorithms for this task. The main contribution of this work is a simple and general testing technique that is applicable to all distribution families, and is particularly suited to those whose /Fourier spectrum/ satisfies a certain approximate /sparsity/ property. To the best of our knowledge, ours is the first use of the Fourier transform in the context of distribution testing. We apply our Fourier-based framework to obtain near sample-optimal and computationally efficient testers for the following fundamental distribution families: Sums of Independent Integer Random Variables (SIIRVs), Poisson Multinomial Distributions (PMDs), and Discrete Log-Concave Distributions. For the first two, ours are the first non-trivial testers in the literature, vastly generalizing previous work on testing Poisson Binomial Distributions. For the third, our tester improves on prior work in both sample and time complexity. Joint work with Ilias Diakonikolas (USC) and Alistair Stewart (USC).

Yuichi Yoshida (NII Tokyo): Testability of Homomorphism Inadmissibility: Property Testing Meets Database Theory

We utilize the perspective of property testing to consider the testability of relational database queries. A primary motivation is the desire to avoid reading an entire database to decide a property thereof. We focus on conjunctive queries, which are the most basic and heavily studied database queries. Each conjunctive query can be represented as a relational structure A such that deciding if the conjunctive query is satisfied by a relational structure B is equivalent to deciding if there exists a homomorphism from A to B. We phrase our results in terms of homomorphisms. Precisely, we study, for each relational structure A, the testability of homomorphism inadmis sibility from A. We consider algorithms that have oracle access to an input relational structure B and that distinguish, with high probability, the case where there is no homomorphism from A to B, from the case where one needs to remove a constant fraction of tuples from B in order to suppress all such homomorphisms. We provide a complete characterization of the structures A from which one can test homomorphism inadmissibility with one-sided error by making a con stant number of queries to B. Our characterization shows that homomorphism inadmissibility from A is constant-query testable with one-sided error if and only if the core of A is α-acyclic. We also show that the injective version of the problem is constant-query testable with one-sided error if A is α-acyclic; this result generalizes existing results for testing subgraph-freeness in the general graph model.

Christian Konrad (University of Warwick): Distributed Minimum Vertex Coloring Approximation

Vertex coloring problems are classic symmetry breaking problems in distributed computing and have been studied for more than 30 years. While most works aim at coloring the input graph with Delta+1 colors (which is easy to achieve sequentially), where Delta is the maximum degree, we consider the minimum vertex coloring problem, i.e., coloring a graph with the least number of colors possible (which on general graphs is NP-hard and hard to approximate within a factor of n^{1-eps}, for any eps > 0). Using exponential time computations, a poly-log(n) approximation can be achieved in poly-log(n) rounds on general graphs, and no better approximation is known. We discuss graph classes that admit better approximations. In particular, we show that a (1+eps)-approximation can be computed in O(log(n) / eps) rounds in chordal graphs.

Lin Yang (Princeton): Clustering High Dimensional Dynamic Data Streams

We first present a data streaming algorithms for the k-median problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space {1,2,…Δ}^d. Our algorithms use k/eps^2 * poly(d log Δ) space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of k centers the k-median cost of the coreset (1+eps)-approximates the cost of the streamed point set. We can use this coreset to compute a (1+eps)-approximation for the k-median problem by any efficient offline k-median algorithm.

Jiecao Chen (Indiana University Bloomington): Distinct Sampling on Streaming Data with Near-Duplicates

In this paper we study how to perform distinct sampling in the streaming model where data contain near-duplicates. The goal of distinct sampling is to return a distinct element uniformly at random from the universe of elements, given that all the near-duplicates are treated as the same element. We also extend the result to the sliding window cases in which we are only interested in the most recent items. We present algorithms with provable theoretical guarantees for datasets in the Euclidean space, and also verify their effectiveness via a set of experiments.

### Tuesday 20th Morning

Silvio Lattanzi (Google Research): A Simple Framework for Optimization over Sliding Windows

In analyzing large stream of data, one often wishes to restrict the input to the last W elements that have arrived, the so-called sliding window problem. In this setting, two algorithmic frameworks are widely-used: exponential histograms and smooth histograms. These frameworks are used to obtain sliding window algorithms from streaming algorithms for large families of functions. However, there are numerous cases where these frameworks cannot be applied directly — for instance smooth histograms fail if the problem cannot be approximated to better than 0.8. Some of these difficulties can be rectified, but only on a case by case basis. In this work, we provide a new algorithmic framework: we identify a key algorithmic problem, that of suffix-composable summaries, and show how to reduce the problem of finding sliding window algorithms to that of using insertion only streaming algorithms and suffix-composable summaries. We first describe the framework of suffix composable summaries, and then instantiate it on a wide range of problems, yielding algorithms for submodular function optimization, diversity optimization, many forms of k-clustering, and others. In doing so, we often match, or improve state of the art results obtained using problem-specific algorithms.

Srikanta Tirthapura (Iowa State): Parallel and Distributed Data Summarization: One size does not fit all

Raphael Clifford (University of Bristol): The Classical Complexity of Boson Sampling

In recent years there has been a great deal excitement about the prospects of achieving near term "quantum computational supremacy". We show that the classical complexity of one of the two main candidates for achieving this supremacy, the so-called Boson Sampling problem, is dramatically faster than was previously thought. This result significantly reduces the likelihood of achieving near-term quantum supremacy via the Boson Sampling problem.

He Sun (University of Edinburgh): Heat kernels in graphs: A journey from random walks to geometry, and back

Heat kernels are one of the most fundamental concepts in physics and mathematics. In physics, the heat kernel is a fundamental solution of the heat equation and connects the Laplacian operator to the rate of heat dissipation. In spectral geometry, many fundamental techniques are based on heat kernels. In finite Markov chain theory, heat kernels correspond to continuous-time random walks and constitute one of the most powerful techniques in estimating the mixing time. In this talk, we will briefly discuss this line of research and its relation to heat kernels in graphs. In particular, we will see how heat kernels are used to design the first nearly-linear time algorithm for finding clusters in real-world graphs. Some interesting open questions will be addressed in the end.

### Tuesday 20th Afternoon

Amit Chakrabarti (Dartmouth College): Counting Triangles and Other Substructures in Graph Streams

We revisit the much-studied problem of space-efficiently estimating the number of triangles in a graph stream, and extensions of this problem to counting fixed-sized cliques and cycles, obtaining a number of new upper and lower bounds. For the important special case of counting triangles, we give a 4-pass, epsilon-approximate, randomized algorithm that runs in about O( epsilon^{-2} m^{3/2} T^{-1} ) space, where m is the number of edges and T is a promised lower bound on the number of triangles. We give an improved multi-pass lower bound of Omega( min{ m^{3/2} T^{-1}, m T^{-1/2} } ), applicable at essentially all edge densities. We also prove other multi-pass lower bounds in terms of various structural parameters of the input graph. Together, our results resolve a couple of open questions raised in recent work (Braverman et al., ICALP 2013). The talk will emphasize general frameworks, for both upper and lower bounds. We give a sampling algorithm for counting arbitrary subgraphs and then improve it via combinatorial means in the special cases of counting odd cliques and odd cycles. Our results show that these problems are considerably easier in the cash-register streaming model than in the turnstile model, where previous work had focused (Manjunath et al., ESA 2011; Kane et al., ICALP 2012).

Michael Kapralov (EPFL): Random Fourier Features for Kernel Ridge Regression​

Random Fourier features is one of the most popular techniques for scaling up kernel methods, such as kernel ridge regression. However, despite impressive empirical results, the statistical properties of random Fourier features are still not well understood. In this paper we take steps toward filling this gap. Specifically, we approach random Fourier features from a spectral matrix approximation point of view, give tight bounds on the number of Fourier features required to achieve a spectral approximation.

Qualitatively, our results are twofold: on the one hand, we show that random Fourier feature approximation can provably speed up kernel ridge regression under reasonable assumptions. At the same time, we show that the method is suboptimal, and sampling from a modified distribution in Fourier space, given by the leverage function of the kernel, yields provably better performance. We study this optimal sampling distribution for the Gaussian kernel, achieving a nearly complete characterization for the case of low-dimensional bounded datasets. Based on this characterization, we propose an efficient sampling scheme with guarantees superior to random Fourier features in this regime.

Joint work with Haim Avron, Cameron Musco, Christopher Musco, Ameya Velingker and Amir Zandieh

Arnold Filtser (Ben Gurion University of the Negev): Ramsey Spanning Trees and their Applications

The ''metric Ramsey'' problem asks for the largest subset $S$ of a metric space that can be embedded into an ultrametric (more generally into a Hilbert space) with a given distortion. Study of this problem was motivated as a non-linear version of Dvoretzky theorem. Mendel and Naor devised the so called Ramsey Partitions to address this problem, and showed the algorithmic applications of their techniques to approximate distance oracles and ranking problems. We study the natural extension of the metric Ramsey problem to graphs, and introduce the notion of ''Ramsey Spanning Trees''. We ask for the largest subset $S\subseteq V$ of a given graph $G=(V,E)$, such that there exists a spanning tree of $G$ that has small stretch for $S$. Applied iteratively, this provides a small collection of spanning trees, such that each vertex has a tree providing low stretch paths to all other vertices. The union of these trees serves as a special type of spanner, a tree-padding spanner. We use this spanner to devise the first compact stateless routing scheme with $O(1)$ routing decision time, and labels which are much shorter than in all currently existing schemes.

Grigory Yaroslavtsev (Indiana University Bloomington): Massively Parallel Algorithms for Single-Linkage Clustering under Lp-Distances

We present first massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of $n$ input $d$-dimensional vectors under Hamming, $\ell_1, \ell_2$ and $\ell_\infty$ distances. All our algorithms run in $O(\log n)$ rounds of MPC for any fixed $d$ and achieve $(1+\epsilon)$-approximation for all distances (except Hamming for which we show an exact algorithm). We also show constant-factor inapproximability results for $o(\log n)$-round algorithms under standard MPC hardness assumptions (for sufficiently large dimension depending on the distance used). Efficiency of implementation of our algorithms in Apache Spark is demonstrated through experiments on the largest available vector datasets from the UCI machine learning repository exhibiting speedups of several orders of magnitude.

### Wednesday 21st Morning

Daniel Ting (Tableau): Statistical data summarization for Simple Counts and Complex ML

We examine two very different problems for data summarization through a statistical lens in order to improve and develop new methods. First, we re-examine the familiar Count-Min sketch and foundational problem of estimating item counts. Rather than relying on concentration bounds to bound the error and motivate the method, we estimate and make use of a full error distribution. This leads to error bounds in the form of tight confidence intervals, as well as improved estimation that matches or beats all existing methods under all regimes. Second, we tackle a much higher level problem of drawing a good sample for use in machine learning and statistical models. In this case, statistical methods allow us to translate the problem of a good sample for a statistical model to a more familiar problem of good estimation of a mean. By doing so, we can show that the theory of asymptotically linear estimators and influence functions yield asymptotically optimal sampling probabilities. It converges to the unique set of sampling probabilities that yield the optimal variance under some regularity conditions. Several existing methods, such as leverage based sampling, are shown to be approximations of our method.

Ke Yi (HKUST): Sampling, Approximation, Discrepancy, and Communication

Robert Krauthgamer (Weizmann): Batch Sparse Recovery

We introduce a batch version of sparse recovery, where the goal is to construct n-dimensional vectors A'_1,...,A'_m that estimate a sequence of unknown signals A_1,...,A_m using linear measurements, each involving exactly one signal vector, under an assumption that the signals are sparse on average. Joint work with Alexandr Andoni, Lior Kamma, and Eric Price.

Christian Sohler (TU Dortmund): Streaming algorithms for fair k-means clustering

We study the k-means problem for points sets in R^d that are endowed with a binary sensitive attribute encoded as color red or blue. Our goal is to develop clusteringalgorithms that minimize the sum of squared distances under the constraint that in each cluster the number of red and blue points does not deviate too much from each other. Such a constraint is studied in the context of fairness of clustering algorithms with the goal of avoiding disparate impact of the sensitive attribute.

Varun Kanade (Oxford): Hierarchical Clustering: Objectives & Efficient Algorithms

Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly finer granularity. Hierarchical clustering has mostly been studied in procedural terms, i.e., a hierarchical cluster tree is obtained using certain bottom-up (agglomerative) or top-down (divisive) heuristics, rather than finding a hierarchical cluster tree as the solution obtained by optimizing some suitable objective. Dasgupta (2016) identified this as a reason why the theory of hierarchical clustering lagged behind that of flat clustering and proposed an objective function. In this talk, I will take an axiomatic approach to identifying suitable objective functions for hierarchical clustering. I will also describe a new random-graph model with a planted hierarchy. New and existing algorithms and their performance in theory and on some preliminary experiments will be discussed. Based on joint work with Vincent Cohen-Addad, Claire Mathieu and Frederik Mallmann-Trenn.

### Thursday 22nd Morning

Sofya Vorotnikova (University of Massachusetts Amherst): Streaming Algorithms for Matchings in Low Arboricity Graphs

Finding a large matching is the most well-studied graph problem in the data stream model. To bypass the $\Omega(n)$ lower bound required to store a matching, recent research has begun to focus on only approximating the size of matchings, resulting in several algorithms with sublinear space bounds. We continue this line of work and present several results for estimating matchings in low arboricity graphs using $o(n)$ space. We give three structural results relating the size of a maximum matching to the arboricity $\alpha$ of a graph and show how those lead to approximation algorithms in different streaming models. For dynamic (insert-delete) streams, we present an algorithm returning $(\alpha+2)(1+\epsilon)$ approximation of the size of the matching using space $O(\alpha \epsilon^{-2} n^{4/5} \polylog n)$. For insert-only streams, we show that the same approximation factor can be achieved in merely $O(\epsilon^{-2} \log n)$ space. And finally, we further improve the approximation factor to $(\alpha+2)$ and space to $O(\log n)$ in adjacency list streams, where edges incident to the same vertex arrive together.

Hoa Vu (University of Massachusetts Amherst): Streaming algorithms for Max-Cover and graphical models

In the streaming set model, the stream consists of m sets that are subsets of an n-element universe. In this talk, I will focus on the problem of finding k sets with the maximum coverage in sublinear o(mn) space using techniques such as sampling, sketching and thresholding greedy. If time permits, I will also survey some recent results on graphical models in data streams such as: a) learning and evaluating graphical models and b) improving algorithms such as finding heavy hitters under graphical models assumptions.

Pan Peng (University of Vienna): Estimating Graph Parameters from Random Order Streams

We develop a new algorithmic technique that allows to transfer some constant time approximation algorithms for general graphs into random order streaming algorithms. We illustrate our technique by showing that in random order streams, one can approximate the number of connected components of the input graph $G$ with an additive error of $\epsilon n$; $(1+\epsilon)$-approximate the weight of the minimum spanning tree of an input graph with bounded maximum edge weight; and $(1+\epsilon)$-approximate the size of a maximum independent set in planar graphs, with constant space complexity that only depends on $\epsilon$ and is independent of the size of the input graph. Based on joint work with Christian Sohler.

Amir Zandieh (EPFL): An Adaptive Sublinear-Time Block Sparse Fourier Transform

The problem of approximately computing the k dominant Fourier coefficients of a vector X quickly, and using few samples in the time domain, is known as the Sparse Fourier Transform (sparse FFT) problem. A long line of work on the sparse FFT has resulted in algorithms with O(k \log n \log(n/k)) runtime [Hassanieh et al., STOC'12] and O(k \logn) sample complexity [Indyk et al., FOCS'14]. These results are proved using non-adaptive algorithms, and the latter O(k \logn) sample complexity result is essentially the best possible under the sparsity assumption alone. This paper revisits the sparse FFT problem with the added twist that the sparse coefficients approximately obey a (k0,k1)-block sparse model. In this model, signal frequencies are clustered in k0 intervals with width k1 in Fourier space, where k=k0k1 is the total sparsity. Signals arising in applications are often well approximated by this model with k0≪k. Our main result is the first sparse FFT algorithm for (k0,k1)-block sparse signals with the sample complexity of O*(k0k1+k0 \log(1+k0) \logn) at constant signal-to-noise ratios, and sublinear runtime. A similar sample complexity was previously achieved in the works on model-based compressive sensing using random Gaussian measurements but used Ω(n) runtime. To the best of our knowledge, our result is the first sublinear-time algorithm for model-based compressed sensing, and the first sparse FFT result that goes below the O(k \logn) sample complexity bound. Our algorithm crucially uses adaptivity to achieve the improved sample complexity bound, and we prove that adaptivity is, in fact, necessary if Fourier measurements are used: Any non-adaptive algorithm must use Ω(k0k1 \log(n/k0k1) ) samples for the (k0,k1)-block sparse model, ruling out improvements over the vanilla sparsity assumption.

Rajesh Chitnis (University of Warwick): Parameterized Streaming Algorithms

The paradigm of parameterized algorithms has proven to be quite useful in coping with NP-hardness. In this talk, we introduce the notion of "parameterized streaming algorithms" where we try to apply the same ideology for streaming algorithms. Based on joint works with Graham Cormode, Hossein Esfandiari, MohammadTaghi Hajiaghayi, Andrew McGregor, Morteza Monemizadeh and Sofya Vorotnikova.