The information complexity of Hamming distance

Eric Blais, University of Waterloo

The Hamming distance problem is a fundamental problem in communication complexity where Alice and Bob both receive binary strings of length n and they try to determine if those strings differ on at most d coordinates or not. In this talk, we will discuss how new lower bounds on the communication complexity of this problem can be obtained via an analysis of its information complexity.

Based on joint work with Joshua Brody and Badih Ghazi.

The 2+p Question in Random Constraint Satisfaction Problems

Harold Connamacher, Case Western Reserve University

The 2+p SAT Threshold Conjecture states that for any $\delta > 0$, there exists an $\epsilon > 0$ such that a uniformly random boolean CNF formula with $n$ variables, $(1-\epsilon)n$ clauses of size 2, and $2/3(1+\delta)n$ clauses of size 3 has no solution, with high probability. We can make equivalent conjectures for other constraint satisfaction problem models, but a 2+p type conjecture has only been resolved for XOR-SAT and an NP-complete generalization of XOR-SAT. The 2+p question is closely tied to algorithm behavior. Many common CSP-solver algorithms work by iteratively assigning a subset of variables. The result of this partial assignment is a reduced formula with a mixture of clause sizes, and the 2+p question is equivalent to asking when a partial assignment to the variables does not extend to a satisfying assignment. This talk will survey the current state of the 2+p SAT Threshold Conjecture as well as present preliminary results from an ongoing study into the size of the spine of a formula with mixed clause sizes. For a formula F, the spine is defined as the set of literals u such that there exists a satisfiable subformula C of F where u $\wedge$ C has no solution. These results suggest a new direction for exploring 2+p questions in CSPs.

Allignment in $R^D$

Steven Damelin, Mathematical Reviews

A classical problem in geometry goes as follows. Suppose we are given two sets of $D$ dimensional data, that is, sets of points in Euclidean D-space, where $D\geq 2$. The data sets are indexed by the same set, and we know that pairwise distances between corresponding points are equal in the two data sets. In other words, the sets are isometric. Can this correspondence be extended to an isometry of the ambient Euclidean space?

In this form the question is not terribly interesting; the answer has long known to be yes. But a related question is actually fundamental in data analysis: here the known points are samples from larger, unknown sets—say, manifolds in $R^D$—and we seek to know what can be said about the manifolds themselves. A typical example might be a face recognition problem, where all we have is multiple finite images of people’s faces from various views.

An added complication is that in general we are not given exact distances. We have noise and so we need to demand that instead of the pairwise distances being equal, they should be close in some reasonable metric.

We will discuss ongoing work on this problem. This is joint with Charles Fefferman (Princeton).

The Complexity of Ising Models with Complex Weights

Heng Guo, UW-Madison

We will talk about the complexity of approximately evaluating the Ising and Tutte partition functions with complex parameters. Our results are motivated by the study of the quantum complexity classes BQP and IQP. We show a full classication of the complexity of multiplicatively approximating the norm and the argument of the partition function for complex edge interactions. Using our classications, we then revisit the connections to quantum computation, drawing conclusions that are different from (and incomparable to) the results in the quantum complexity literature. We also show give a complete dichotomy for the case in which the parameters are roots of unity.

Conditional Distribution Search

Brendan Juba, Washington University in St. Louis

"What is ""Big Data"" good for? A standard answer in learning theory is that it allows larger and richer representations to be fit with a lower generalization error, and thus ""Big Data"" should just be about fitting sophisticated models. An alternative answer is that a large data set features good coverage of relatively rare events. The natural problem raised by this alternative view is identifying such rare events, identifying events such that the conditional distributions are a good fit for some (problem-dependent) criteria.

We will consider algorithms for a simple version of this conditional distribution search task, that captures ""abductive reasoning."" We will see why events represented by k-DNF formulas can be found efficiently, but events represented by conjunctions may not be discoverable by efficient algorithms. Furthermore, we will see that this problem is closely related to a ""positive-reliable"" variant of the supervised learning task, introduced by Kalai, Kanade, and Mansour, in which we seek a classifier that has a false-positive rate below an arbitrary, pre-specified tolerance."

Computational Complexity of Certifying the Restricted Isometry Property

Abhiram Natarajan, Purdue University

A matrix is (k, delta)-RIP if, when multiplied on the left, it preserves the 2-norm of k-sparse vectors within a delta threshold of the original magnitude. In many applications, such as compressed sensing and sparse recovery, it is desirable to construct RIP matrices with a large k and a small delta. Deterministic constructions are not very effective in giving large k, while randomized methods are extremely effective. Given the efficacy of random constructions in generating useful RIP matrices, the problem of certifying the RIP parameters of a matrix has become important.

In this paper, we prove that it is hard to approximate the RIP parameters of a matrix assuming the Small-Set-Expansion-Hypothesis. In order to prove the hardness result, we prove a variant of the Cheeger's Inequality for sparse vectors.

Joint work with Yi Wu.

Lower bounds on the size of semidefinite programming relaxations

David Steurer, Cornell University

We introduce a method for proving lower bounds on the efficacy of semidefinite programming (SDP) relaxations for combinatorial problems. In particular, we show that the cut, TSP, and stable set polytopes on $n$-vertex graphs are not the linear image of the feasible region of any SDP (i.e., any spectrahedron) of dimension less than $2^{n^{\delta}}$, for some constant $\delta > 0$. This result yields the first super-polynomial lower bounds on the semidefinite extension complexity of any explicit polytope.

Our results follow from a general technique for proving lower bounds on the positive semidefinite rank of a matrix. To this end, we establish a close connection between arbitrary SDPs and those arising from the sum-of-squares hierarchy. For approximating maximum constraint satisfaction problems, we prove that SDPs of polynomial-size are equivalent in power to those arising from degree-$O(1)$ sum-of-squares relaxations. This result implies, for instance, that no family of polynomial-size SDP relaxations can achieve better than a 7/8-approximation for MAX 3-SAT.

Joint work with James Lee and Prasad Raghavendra.

Characterizing Arithmetic Read-Once Formulae

Ilya Volkovich, University of Michigan

An arithmetic read-once formula (ROF for short) is a formula (i.e. a tree of computation) in which the operations are $+$ and $x$ such that every input variable labels at most one leaf. Those formulae correspond to the smallest possible functions that depend on all of their variables. In this work we give a simple characterization of the read-once formulae. Other than being interesting in its own right, our characterization gives rise to a property testing algorithm for such formulae. To the best of our knowledge, prior to our work no characterization and/or property testing algorithm was known for this kind of formulae. We also show that no such characterization is possible for the ROF’s Boolean counterpart.

Siegel's theorem, edge coloring, and a holant dichotomy

Tyson Williams, University of Wisconsin-Madison

What does Siegel's theorem on finiteness of integer solutions have to do with complexity theory? In this talk we discuss a new complexity dichotomy theorem for counting problems. Such a dichotomy is a classification of a class of problems into exactly two kinds: those that are polynomial time computable, and those that are #P-hard, and thus intractable. An example problem in this dichotomy is the problem of counting the number of valid edge colorings of a graph. We show that an effective version of Siegel's theorem and some Galois theory are key ingredients in the proof of this dichotomy.

This is Joint work with Jin-Yi Cai and Heng Guo.

Principal Component Analysis and Higher Correlations for Distributed Data

David Woodruff, IBM Almaden

We consider algorithmic problems in the setting in which the input data has been partitioned arbitrarily on many servers. The goal is to compute a function of all the data, and the bottleneck is the communication used by the algorithm. We present algorithms for two illustrative problems on massive data sets: (1) computing a low-rank approximation of a matrix A = A1+A2+: : :+As, with matrix At stored on server t and (2) computing a function of a vector a1+a2+: : :+as, where server t has the vector at; this includes the well-studied special case of computing frequency moments and separable functions, as well as higher-order correlations such as the number of subgraphs of a specified type occurring in a graph. For both problems we give algorithms with nearly optimal communication, and in particular the only dependence on n, the size of the data, is in the number of bits needed to represent indices and words (O(log n)).

Joint work with Ravi Kannan and Santosh Vempala

New upper bounds for spherical two-distance sets from semidefinite programming

Wei-Hsuan Yu, Michigan State Univesity

The maximum size of spherical few-distance sets had been studied by Delsarte at al. in the 1970s. We use the semidefinite programming method to extend the known results of the maximum size of spherical two-distance sets in R^n when n=23 and 40 <= n <= 93 except n= 46, 78. We also find the maximum size for equiangular line sets in R^n when 24 <=n <= 41 and n=43. This provides a partial resolution of the conjecture set forth by Lemmens and Seidel (1973). We also derive new relative bounds for the equiangular line sets and prove the non-existence of tight spherical designs of harmonic index 4 in R^n for n >= 3.