The Dantzig selector recovers sparse vectors in R^m from n<<m noisy measurements

You need to know: Euclidean space ${\mathbb R}^m$ with scalar product $(x,y) = \sum\limits_{i=1}^m x_iy_i$ , norms $||x||_2 = \sqrt{(x,x)}$ , $||x||_1 = \sum\limits_{i=1}^m |x_i|$ , and $||x||_\infty = \max\limits_{1\leq i \leq m}|x_i|$ in ${\mathbb R}^m$ , support of vector $x \in {\mathbb R}^m$ (subset $T\subset \{1,\dots,m\}$ such that $x_i=0$ for all $i \not\in T$ ), $n \times m$ matrix, matrix multiplication, notation $A^*$ for the transpose of matrix $A$ , big O notation, linear programming, basic probability, mean, variance, independent identically distributed (i.i.d.) random variables, normal distribution $N(\mu, \sigma^2)$ with mean $\mu$ and variance $\sigma^2$ .

Background: A vector $x_0 \in {\mathbb R}^m$ is called S-sparse if it is supported on $T_0$ with $|T_0|\leq S$ . For $x_0 \in {\mathbb R}^m$ let $y=Ax_0+e$ , where A is $n \times m$ matrix ( $n<m$ ) and $e\in {\mathbb R}^n$ is a random error (noise) whose components $e_i$ are i.i.d. and follow $N(0, \sigma^2)$ . The Dantzig selector $x^*$ is a solution to the problem $\min\limits_{x \in {\mathbb R}^m} ||x||_1$ subject to $||A^*(y-Ax)||_{\infty} \leq \lambda_m \cdot \sigma$ , where $\lambda_m>0$ is a constant.

For $T\subset \{1,\dots,m\}$ , let $A_T$ be $n \times |T|$ submatrix of A formed from the columns of A corresponding to the indices in T. For $S>0$ , let $\delta_S$ be the smallest number such that inequalities $(1-\delta_S)||c||_2^2 \leq ||A_Tc||_2^2\leq (1+\delta_S)||c||_2^2$ hold for all T with $|T|\leq S$ and all $c\in{\mathbb R}^{|T|}$ . Also, for $S,S'>0$ , such that $S+S'\leq m$ , let $\theta_{S,S'}$ be the smallest number such that inequality $|(A_Tc, A_{T'}c')|\leq \theta_{S,S'}||c||_2 ||c'||_2$ holds for all disjoint sets $T,T'\subset \{1,\dots,m\}$ with $|T|\leq S$ and $|T'|\leq S'$ and all $c\in{\mathbb R}^{|T|}$ , $c'\in{\mathbb R}^{|T'|}$ .

The Theorem: On 5th June 2005, Emmanuel Candes and Terence Tao submitted to arxiv a paper in which they proved the following result. Let A be $n \times m$ matrix ( $n<m$ ), S be such that $\delta_{2S}+\theta_{S,2S} < 1$ , $x_0 \in {\mathbb R}^m$ be S-sparse, and $x^*\in {\mathbb R}^m$ be the Dantzig selector with $\lambda_m=\sqrt{2(1+a)\log m}$ for some $a>0$ . Then inequality $||x^*-x_0||_2 \leq C \lambda_m\sqrt{S}\sigma$ , where $C=4/(1-\delta_{2S}-\theta_{S,2S})$ , holds with probability exceeding $1-\left(\sqrt{\pi\log m}\cdot m^a\right)^{-1}$ .

Short context: In the previous paper, Candes, Romberg, and Tao developed an efficient procedure witch uses the results of a low number ( $n=O(S\log m)$ random measurements suffices) of measurements with arbitrary (random or non-random) error $e$ , to recover S-sparse vector $x_0 \in {\mathbb R}^m$ with error $O(||e||_2)$ . With arbitrary $e$ , this result is optimal. The Theorem, however, states that with random measurement error $e$ (which often the case in the applications) we can recover $x_0$ with even higher accuracy $O(\sqrt{\log m}\sigma)$ with high probability using the Dantzig selector $x^*$ , which can be computed using linear programming. The error in the Theorem is optimal up to the constant and $\sqrt{\log m}$ factors.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

Cayley graphs of symmetric groups can form explicit family of bounded-degree expanders

You need to know: Graph, graphs of bounded degree, group, symmetric group $\text{Sym}(n)$ (group of permutations of n symbols), notation $|A|$ for the number of elements in a finite set $A$ .

Background: $For a graph$ $\Gamma$ $h(\Gamma)=\min\limits_{1\leq |S|\leq |\Gamma|/2} \frac{|\partial S|}{|S|}$ $,$ $\Gamma$ An infinite family of graphs $\Gamma_1, \Gamma_2, \dots, \Gamma_n, \dots$ is a family of $\epsilon$ -expanders if $h(\Gamma_n)\geq \epsilon$ for all n.

A generating set of a group G is a subset $S \subset G$ such that every $g\in G$ can be written as a finite product of elements of S and their inverses. The (undirected) Cayley graph $\Gamma =\Gamma (G,S)$

The Theorem: On 28th May 2005, Martin Kassabov submitted to arxiv a paper in which he proved the existence of universal constants L and $\epsilon>0$ such that the following holds. For every n there exists a generating set $S_n$ of size at most L of the symmetric group $\text{Sym}(n)$ such that the Cayley graphs $\Gamma(\text{Sym}(n), S_n)$ form a family of $\epsilon$ -expanders.

Short context: Constructing explicit families of expanders is important, for example, for algorithmic applications. Although there are some combinatorial constructions, more traditional method is based on Cayley graphs of groups. Given an infinite family of finite groups, the problem is to construct generating sets to make their Cayley graphs expanders. The Theorem solves this problem for symmetric groups.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

The Erdős-Selfridge conjecture for covering systems is true

You need to know: Notation $n\equiv m (\text{mod\,} s)$ if $n-m$ is a multiple of s.

Background: A finite set S of distinct positive integers $0<s_1<s_2<\dots<s_k$ is called a covering system if there exist integers $m_1, m_2, \dots, m_k$ such that each integer n satisfies at least one of the congruences $n\equiv m_i (\text{mod\,} s_i), \, i=1,2,\dots,k$ .

The Theorem: On 25th May 2005, Michael Filaseta, Kevin Ford, Sergei Konyagin, Carl Pomerance, and Gang Yu submitted to the Journal of the AMS a paper in which they proved that for any number B, there is a number $N_B$ , such that inequality $\sum\limits_{i=1}^k\frac{1}{s_k}>B$ holds for any covering system S with $s_1 > N_B$ .

Short context: A famous problem of Erdős from 1950 asks whether for every N there is a covering system S with all $s_i$ greater than N. In 1973, Erdős and Selfridge conjectured that this is not true if we also require that sum of reciprocals $\sum\limits_{i=1}^k\frac{1}{s_k}$ of elements of S is bounded by some constant B. The Theorem confirms this conjecture. In particular, this implies that for any number $K>1$ and N sufficiently large, depending on K, there is no covering system S with all $s_i$ from the interval $(N,KN]$ . In a later work, Hough proved that in fact there is no covering system with $s_1>10^{16}$ , thus solving the original Erdős problem.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

Nevanlinna characteristics T(r,f(z + z0)) grows as T(r,f(z)) for finite order meromorphic functions f

You need to know: Complex numbers, notation $|z|$ for the absolute value of a complex number $z$ , function in a complex variable, meromorphic function $f$ , poles of $f$ and their multiplicity, integration, notation $\log^+ x = \max(\log x, 0)$ , $\limsup$ notation, big O notation, notation $f(r) \sim g(r)$ if $\lim\limits_{r \to \infty}\frac{f(r)}{g(r)}=1$ .

Background: For a meromorphic function $f$ and real $r \geq 0$ , let $n(r,f)$ be the number of poles $z_i$ of $f$ , counting multiplicity, such that $|z_i|\leq r$ . The Nevanlinna characteristic $T(r,f)$ of $f$ is $T(r,f)=m(r,f)+N(r,f)$ , where $m(r,f) = \frac{1}{2\pi}\int_0^{2\pi}\log^+|f(re^{i\theta})|d\theta$ and $N(r,f)=\int_0^r(n(t,f)-n(0,f))\frac{dt}{t}+n(0,f)\log r$ . The order of a meromorphic function $f$ is $\sigma(f)=\limsup\limits_{r\to\infty}\frac{\log^+ T(r,f)}{\log r}$ .

The Theorem: On 6th May 2005, Yik-Man Chiang and Shao-Ji Feng submitted to The Ramanujan Journal a paper in which they proved that for every meromorphic function $f$ of order $\sigma(f) < \infty$ , any fixed complex number $\eta\neq 0$ , and any $\epsilon>0$ , we have $T(r,f(z + \eta)) = T(r,f) + O(r^{\sigma(f)-1+\epsilon}) + O(\log r)$ .

Short context: Nevanlinna characteristic $T(r,f)$ measures the rate of growth of a meromorphic function $f$ , and can be used to describe the asymptotic distribution of solutions of the equation $f(z)=a$ as $a$ varies. The Theorem implies that $T(r,f(z + \eta)) \sim T(r,f)$ for finite order meromorphic functions. The authors demonstrate various applications of this result to, for example, difference equations.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

Dynamical critical site percolation on the triangular lattice has exceptional times

You need to know: Graph, infinite graph, connected subgraph of a graph, basic probability theory, exponential distribution with parameter $\lambda$ .

Background: The triangular grid is the (infinite) graph G whose vertex set is $V \subset {\mathbb R}^2$ given by $V=\{(k+m/2, \sqrt{3}m/2)\,|\, k,m \in {\mathbb Z}\}$ , and two vertices are connected by an edge if and only if the distance between them is $1$ . Assume that each vertex of G, independently of other vertices, (i) is coloured black or white at time $t=0$ , with probability $1/2$ for each colour, (ii) generate a random variable $T$ according to exponential distribution with parameter $\lambda=1/2$ , wait for time $T$ , and switch the colour, and (iii) repeat part (ii) again and again. This is called the dynamical critical site percolation on the triangular grid G. If, at some moment $t=t_0$ , graph G has an infinite connected subgraph consisting on only white vertices, we say that percolation occurs. Let ${\cal T}$ be the set of all times $t\in[0,1]$ at which percolation occurs.

The Theorem: On 29th April 2005, Oded Schramm and Jeffrey Steif submitted to arxiv a paper in which they proved that, with probability $1$ , the set ${\cal T}$ is non-empty.

Short context: Site percolation is an important model in mathematics for studying processes such as diffusion. It is studied for various grids, such as triangular, square, etc. In the triangular grid model described above, it is known that at any fixed $t_0>0$ , the probability that percolation occurs at time $t=t_0$ is $0$ . The Theorem states, however, that, with probability $1$ , there is a non-empty set ${\cal T}$ of exceptional times at which percolation occurs.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 10.3 of this book for an accessible description of the Theorem.

Go to the list of all theorems

Multidimensional Szemeredi’s theorem holds with an explicit bound

You need to know: Set ${\mathbb Z}^r$ of vectors with r integer components, notation $a+dX$ for set $\{y \in {\mathbb Z}^r\, : \, y=a+dx, \, x\in X\}$ , where $a \in {\mathbb Z}^r$ , $X \subset {\mathbb Z}^r$ , and $d>0$ is an integer.

Background: We say that number $N=N(x_1, \dots, x_m)$ is explicitly computable if there is an algorithm which takes $x_1, \dots x_m$ as an input and returns N as an output.

The Theorem: On 25th April 2005, Timothy Gowers submitted to the Annals of Mathematics a paper in which he proved that, for every $\delta > 0$ , every positive integer r, and every finite subset $X \subset {\mathbb Z}^r$ , there is an explicitly computable integer $N=N(\delta, r, X)>0$ , such that every subset $A$ of the grid $\{1, 2, . . . , N\}^r$ of size at least $\delta N^r$ has a subset of the form $a+dX$ for some $a \in {\mathbb Z}^r$ and integer $d>0$ .

Short context: In 1975, Szemerédi proved that for any positive integer k and any $\delta > 0$ , there exists a positive integer $N = N(k, \delta)$ such that every subset of the set $\{1, 2, \dots ,N\}$ of size at least $\delta N$ contains an arithmetic progression of length k. In a paper submitted in 2000, Gowers proved that this statement holds with $N=\exp(\exp(\delta^c))$ , where $c=2^{-2^{k+9}}$ . The statement in the Theorem is a multidimensional version of Szemeredi’s theorem. The existence of $N=N(\delta, r, X)$ required in the Theorem was first proved by Furstenberg and Katznelson in 1978, but without any explicit algorithm how to actually compute this N. The Theorem provides such an algorithm.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 7.10 of this book for an accessible description of the Theorem.

Go to the list of all theorems

The Hasse principle holds for pairs of diagonal cubic forms in s>=13 variables

You need to know: p-adic fields ${\mathbb Q}_p$ .

Background: If a system of equations has a solution with all variables $0$ , we call such solution trivial and all other solutions non-trivial. We say that a collection of systems of equations satisfies the Hasse principle if, whenever one of the systems has a non-trivial solution in real numbers and in all the p-adic fields ${\mathbb Q}_p$ for all primes p, then it has a non-trivial solution in rational numbers.

The Theorem: On 14th April 2005, Jörg Brüdern and Trevor Wooley submitted to the Annals of Mathematics a paper in which they proved that for any integer $s\geq 13$ , and any integer coefficients $a_j, b_j, \, 1\leq j \leq s$ , the system of equations $\sum\limits_{i=1}^sa_ix_i^3 = \sum\limits_{i=1}^sb_ix_i^3 = 0$ has a non-trivial solution in integers if and only if it has a non-trivial solution in the 7-adic field.

Short context: The Hasse principle provides necessary and sufficient conditions for the existence of non-trivial integer solutions in a system of equations. The famous Hasse–Minkowski theorem states that this principle holds for any quadratic equation in s variables in the form $\sum\limits_{i=1}^s \sum\limits_{j=1}^s a_{ij}x_ix_j=0$ with integer coefficients $a_{ij}$ . For cubic equations, the Hasse principle fails in general, but The Theorem implies that it holds for the system $\sum\limits_{i=1}^sa_ix_i^3 = \sum\limits_{i=1}^sb_ix_i^3 = 0$ in $s\geq 13$ variables. This result was earlier known for $s\geq 14$ , and fails for $s\leq 12$ , hence the condition $s\geq 13$ in the Theorem is the best possible.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 7.9 of this book for an accessible description of the Theorem.

Go to the list of all theorems

The sequence of perfect squares is L1-universally bad

You need to know: Probability space X with measure $\mu$ and set of measurable subsets ${\cal X}$ , the notion of “almost all” $x\in X$ , notation $T^n(x)$ for n-fold composiition $T(T(\dots T(x))\dots)$ of map $T: X \to X$ , notation $T^{-1}(A):=\{x \in X\,|\, T(x) \in A\}$ for the preimage of $A \subset X$ , integration on X, notation $L^p(X)$ for set of functions $f:X \to {\mathbb R}$ such that $\int_X|f(x)|^pd\mu < \infty$ , notation $\{n_k\}_{k=1}^\infty$ for sequence $n_1, n_2, \dots, n_k, \dots$ .

Background: An ergodic dynamical system $(X,T)$ is a probability space X, together with map $T: X \to X$ such that (i) $\mu (T^{-1}(A)) = \mu(A)$ for all $A \in {\cal X}$ , and (ii) for any $A \in {\cal X}$ with $T^{-1}(A)=A$ , either $\mu(A)=0$ or $\mu(A)=1$ . A sequence $\{n_k\}_{k=1}^\infty$ is called L1-universally bad if for all ergodic dynamical systems $(X,T)$ there is some $f \in L^1(X)$ and $A \in {\cal X}$ with $\mu(A)>0$ , such that the limit $\lim\limits_{N\to\infty}\frac{1}{N}\sum\limits_{k=1}^N f(T^{n_k}(x))$ fails to exist for all $x\in A$ .

The Theorem: On 5th April 2005, Zoltán Buczolich and Daniel Mauldin submitted to arxiv and The Annals of Mathematics a paper in which they proved that the sequence $\{k^2\}_{k=1}^\infty$ is L1-universally bad.

Short context: Famous Birkhoff’s Ergodic Theorem states that, for any ergodic dynamical system $(X,T)$ and any $f \in L^1(X)$ the limit $\lim\limits_{N\to\infty}\frac{1}{N}\sum\limits_{k=1}^N f(T^k(x))$ exists for almost all $x\in X$ (in fact, this limit is equal to $\int_X f(x) d\mu$ ). An important research direction is to understand for which sequences $\{n_k\}_{k=1}^\infty$ the corresponding limit $\lim\limits_{N\to\infty}\frac{1}{N}\sum\limits_{k=1}^N f(T^{n_k}(x))$ exists for almost all $x\in X$ . In 1988, Bourgain proved this for sequence $\{k^2\}_{k=1}^\infty$ , provided that $f \in L^p(X)$ for some $p>1$ , and asked if the same is true for all $f \in L^1(X)$ . The Theorem answers this question negatively.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 10.6 of this book for an accessible description of the Theorem.

Go to the list of all theorems

The “Majority Is Stablest” conjecture is true

You need to know: Set $\{-1,1\}^n$ of sequences $x_1, \dots, x_n$ such that each $x_i$ is either $-1$ or $1$ . Notation $[n]$ for set $\{1,2,\dots,n\}$ , notation $\text{sign}(x)$ for the sign of real number x.

Background: Boolean function is a function $f:\{-1,1\}^n \to \{-1,1\}$ . Denote $E[f] = \frac{1}{2^n}\sum\limits_{x\in \{-1,1\}^n}f(x)$ the average value of f. For $i\in [n]$ , let $N_i(f)$ be the number of $x=(x_1, x_2, \dots, x_n)\in \{-1,1\}^n$ , sich that $f(x)\neq f(x_1,\dots,x_{i-1},-x_i,x_{i+1},\dots,x_n)$ . Ratio $\text{Inf}_i(f)=N_i(f)/2^n$ is called the influence of i. Equivalently, $\text{Inf}_i(f) = \sum\limits_{S\subset[n], i\in S}\hat{f}(S)^2$ , where $\hat{f}(S)=\frac{1}{2^n}\sum\limits_{x\in \{-1,1\}^n}\left[f(x)\prod\limits_{i\in S}x_i\right]$ . For $0\leq \rho \leq 1$ , the noise stability of f is $S_\rho(f)=\sum\limits_{S\subset[n]}\rho^{|S|}\hat{f}(S)^2$ .

The Theorem: On 23rd March 2005, Elchanan Mossel, Ryan O’Donnell, and Krzysztof Oleszkiewicz submitted to arxiv a paper in which they proved that for any $0 \leq \rho \leq 1$ and $\epsilon>0$ there exists a $\delta>0$ such that if $f:\{-1,1\}^n \to \{-1,1\}$ satisfies $E[f]=0$ and $\text{Inf}_i(f) \leq \delta$ for all i, then $S_\rho(f) \leq \frac{2}{\pi}\arcsin(\rho)+\epsilon$ .

Short context: For the majority function $\text{Maj}_n(x)=\text{sign}\left(\sum\limits_{i=1}^n x_i\right)$ (we may assume that n is odd to guarantee that $\sum\limits_{i=1}^n x_i \neq 0$ ), we have $\lim\limits_{n \to \infty}S_\rho(\text{Maj}_n) = \frac{2}{\pi}\arcsin(\rho)$ . Thus, the Theorem states that the majority function has (in the limit $n\to\infty$ and up to $\epsilon$ ) the highest noise stability among all functions with $E[f]=0$ and low influence. This statement has been conjectured by Khot, Kindler, Mossel, and O’Donnell, and was known as the “Majority Is Stablest” conjecture. In has applications to the analysis of voting systems and to algorithmic complexity for some combinatorial problems.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 10.1 of this book for an accessible description of the Theorem.

Go to the list of all theorems

The spectrum of the almost Mathieu operator is a Cantor set for all irrational frequencies

You need to know: Set ${\mathbb Z}$ of integers, infinite sum $\sum\limits_{n \in {\mathbb Z}} x_n$ , Cantor set.

Background: Let $l^2(\mathbb Z)$ be the set of all infinite sequences $x=(\dots, x_{-1}, x_0, x_1, \dots)$ such that $\sum\limits_{n \in {\mathbb Z}} x_n^2 < \infty$ . A map $T:l^2(\mathbb Z) \to l^2(\mathbb Z)$ is called invertible if for every $y \in l^2(\mathbb Z)$ there exists a unique $x \in l^2(\mathbb Z)$ such that $T(x)=y$ . The almost Mathieu operator is the map $H:l^2(\mathbb Z) \to l^2(\mathbb Z)$ mapping each $x \in l^2(\mathbb Z)$ into $(Hx)_n = x_{n+1} + x_{n-1} + 2 \lambda \cos 2\pi (\theta + n \alpha) x_n, \, n\in{\mathbb Z}$ , where $\lambda\neq 0$ , $\alpha$ , and $\theta$ are real parameters, called coupling, frequency, and phase, respectively. The set of all $t\in{\mathbb R}$ for which map $H_t:l^2(\mathbb Z) \to l^2(\mathbb Z)$ given by $(H_tx)_n = t x_n - (Hx)_n$ , $n\in{\mathbb Z}$ , is not invertible is called the spectrum of H.

The Theorem: On 17th March 2005, Artur Avila and Svetlana Jitomirskaya submitted to arxiv a paper in which they proved that the spectrum of the almost Mathieu operator is a Cantor set for all irrational $\alpha$ and for all $\theta$ and all $\lambda \neq 0$ .

Short context: The almost Mathieu operator and its spectrum arise from applications in physics. The Theorem confirms the conjecture proposed by Azbel in 1964. In 1981, Mark Kac offered ten martinis for anyone who could prove or disprove it, and since then the problem has been known as “The Ten Martini Problem”.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 9.6 of this book for an accessible description of the Theorem.

Go to the list of all theorems