4)log n random entries

You need to know: Euclidean space ${\mathbb R}^n$ , orthonormal vectors in ${\mathbb R}^n$ , matrix, transpose $A^T$ of matrix A, matrix multiplication, eigenvalue of a square matrix, rank of a matrix, basic probability theory, selecting uniformly at random.

Background: The singular value of matrix X is an eigenvalue of $XX^T$ , and denote $||X||_*$ the sum of all singular values of X. For positive integers $n_1,n_2$ , and $0<r \leq \min\{n_1,n_2\}$ , let $u_1,\dots u_r$ be a family of orthonormal vectors in ${\mathbb R}^{n_1}$ sampled uniformly at random among all such families. Let $v_1,\dots v_r$ be a similar random family of orthonormal vectors in ${\mathbb R}^{n_2}$ . Then $M=\sum\limits_{i=1}^r \sigma_i u_i v_i^T$ , where $\sigma_i$ are some (non-random) positive numbers, is a $n_1\times n_2$ matrix of rank r, which we call a random matrix from random orthogonal model.

The Theorem: On 29th May 2008, Emmanuel Candès and Benjamin Recht submitted to arxiv a paper in which they proved the following result. Let M be an $n_1\times n_2$ matrix of rank r sampled from the random orthogonal model, and put $n=max(n_1,n_2)$ . Suppose we observe m entries of M with locations $\Omega$ sampled uniformly at random. Then there are constants C and c such that if $m\geq Crn^{5/4}\log n$ , the minimizer to the problem $\min ||X||_* \,\,\, \text{s.t.} X_{ij}=M_{ij}, \, (i,j)\in \Omega$ is unique and equal to M with probability at least $1-cn^{-3}\log n$ .

Short context: Recovery of missing entries of low-rank matrices is central in many applications. The Theorem states that when we have only a small portion of random entries of random matrix of rank r, we can recover all other entries with high probability, by solving a simple convex optimisation problem, which can be done fast using existing software.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

The problem of computing a two-player Nash equilibrium is PPAD-complete

You need to know: Vectors, matrices, transpose $x^T$ of vector x, matrix multiplication, algorithms and their running times, polynomial algorithm, polynomial-time reduction, complexity classes P and PPAD (Polynomial Parity Arguments on Directed graphs), PPAD-complete problem.

Background: Let ${\mathbb P}^n$ be the set of vectors $x=(x_1,\dots,x_n)\in{\mathbb R}^n$ such that all $x_i \geq 0$ and $\sum\limits_{i=1}^n x_i = 1$ . The problem of computing a two-player Nash equilibrium is: given two $m\times n$ matrices A and B with rational entries, compute vectors $x^* \in {\mathbb P}^m$ and $y^* \in {\mathbb P}^n$ such that $(x^*)^TAy^* \geq x^TAy^*$ and $(x^*)^TBy^* \geq (x^*)^TBy$ for all $x\in {\mathbb P}^m$ and $y\in {\mathbb P}^n$ .

The Theorem: On 12th April 2007, Xi Chen, Xiaotie Deng, and Shang-Hua Teng submitted to arxiv a paper in which they proved that the problem of computing a two-player Nash equilibrium is PPAD-complete.

Short context: The problem of computing a two-player Nash equilibrium is a fundamental problem in game theory with applications to, for example, economics. Famous theorem of Nash from 1950 implies that Nash equilibrium exists for any matrices A and B. However, it was an open question if there is a polynomial-time algorithm for computing it. The Theorem implies that such algorithm does not exist, unless a well-believed conjecture P $\neq$ PPAD fails.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

Systems of polynomial equations can be solved in expected polynomial time

You need to know: Complex numbers, polynomials in complex variable, notation ${\mathbb C}^n$ for set of vectors $z=(z_1, \dots, z_n)$ with complex $z_i$ , algorithm, probabilistic algorithm, average running time of an algorithm.

Background: Let $n>0$ be a positive integer. Let H be the space of all systems $f = [f_1,\dots ,f_n]:{\mathbb C}^n \to {\mathbb C}^n$ on n polynomial equations in n unknowns. For $f \in H$ , denote $V_{\mathbb C}(f) = \{x \in {\mathbb C}^n : f_i(x) = 0, \, 1 \leq i \leq n\} \subset {\mathbb C}^n$ the set of solutions of $f$ .

The Theorem: On 2nd November 2006, Carlos Beltrán and Luis Pardo submitted to the Journal of the AMS a paper in which they described an explicit probabilistic algorithm that computes an approximate solution of any given $f \in H$ with any given accuracy (with probability 1), and proved that the average number of arithmetic operations of this algorithm is polynomial in the size of the input.

Short context: Solving polynomial equations and systems of such equations is one of the fundamental problems in the whole mathematics. In 1998, Smale presented a list of problems he considered as the most important ones for the 21st century. Smales 17th problem asks for the algorithm which could solve systems of polynomial equations with polynomial average running time. The Theorem provides a probabilistic algorithm which achieves this. See here and here for further progress in later works.

Links: The original paper is available here.

Go to the list of all theorems

The Dantzig selector recovers sparse vectors in R^m from n<<m noisy measurements

You need to know: Euclidean space ${\mathbb R}^m$ with scalar product $(x,y) = \sum\limits_{i=1}^m x_iy_i$ , norms $||x||_2 = \sqrt{(x,x)}$ , $||x||_1 = \sum\limits_{i=1}^m |x_i|$ , and $||x||_\infty = \max\limits_{1\leq i \leq m}|x_i|$ in ${\mathbb R}^m$ , support of vector $x \in {\mathbb R}^m$ (subset $T\subset \{1,\dots,m\}$ such that $x_i=0$ for all $i \not\in T$ ), $n \times m$ matrix, matrix multiplication, notation $A^*$ for the transpose of matrix $A$ , big O notation, linear programming, basic probability, mean, variance, independent identically distributed (i.i.d.) random variables, normal distribution $N(\mu, \sigma^2)$ with mean $\mu$ and variance $\sigma^2$ .

Background: A vector $x_0 \in {\mathbb R}^m$ is called S-sparse if it is supported on $T_0$ with $|T_0|\leq S$ . For $x_0 \in {\mathbb R}^m$ let $y=Ax_0+e$ , where A is $n \times m$ matrix ( $n<m$ ) and $e\in {\mathbb R}^n$ is a random error (noise) whose components $e_i$ are i.i.d. and follow $N(0, \sigma^2)$ . The Dantzig selector $x^*$ is a solution to the problem $\min\limits_{x \in {\mathbb R}^m} ||x||_1$ subject to $||A^*(y-Ax)||_{\infty} \leq \lambda_m \cdot \sigma$ , where $\lambda_m>0$ is a constant.

For $T\subset \{1,\dots,m\}$ , let $A_T$ be $n \times |T|$ submatrix of A formed from the columns of A corresponding to the indices in T. For $S>0$ , let $\delta_S$ be the smallest number such that inequalities $(1-\delta_S)||c||_2^2 \leq ||A_Tc||_2^2\leq (1+\delta_S)||c||_2^2$ hold for all T with $|T|\leq S$ and all $c\in{\mathbb R}^{|T|}$ . Also, for $S,S'>0$ , such that $S+S'\leq m$ , let $\theta_{S,S'}$ be the smallest number such that inequality $|(A_Tc, A_{T'}c')|\leq \theta_{S,S'}||c||_2 ||c'||_2$ holds for all disjoint sets $T,T'\subset \{1,\dots,m\}$ with $|T|\leq S$ and $|T'|\leq S'$ and all $c\in{\mathbb R}^{|T|}$ , $c'\in{\mathbb R}^{|T'|}$ .

The Theorem: On 5th June 2005, Emmanuel Candes and Terence Tao submitted to arxiv a paper in which they proved the following result. Let A be $n \times m$ matrix ( $n<m$ ), S be such that $\delta_{2S}+\theta_{S,2S} < 1$ , $x_0 \in {\mathbb R}^m$ be S-sparse, and $x^*\in {\mathbb R}^m$ be the Dantzig selector with $\lambda_m=\sqrt{2(1+a)\log m}$ for some $a>0$ . Then inequality $||x^*-x_0||_2 \leq C \lambda_m\sqrt{S}\sigma$ , where $C=4/(1-\delta_{2S}-\theta_{S,2S})$ , holds with probability exceeding $1-\left(\sqrt{\pi\log m}\cdot m^a\right)^{-1}$ .

Short context: In the previous paper, Candes, Romberg, and Tao developed an efficient procedure witch uses the results of a low number ( $n=O(S\log m)$ random measurements suffices) of measurements with arbitrary (random or non-random) error $e$ , to recover S-sparse vector $x_0 \in {\mathbb R}^m$ with error $O(||e||_2)$ . With arbitrary $e$ , this result is optimal. The Theorem, however, states that with random measurement error $e$ (which often the case in the applications) we can recover $x_0$ with even higher accuracy $O(\sqrt{\log m}\sigma)$ with high probability using the Dantzig selector $x^*$ , which can be computed using linear programming. The error in the Theorem is optimal up to the constant and $\sqrt{\log m}$ factors.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

L1-regularisation recovers sparse vectors in R^m from n<<m inaccurate measurements

You need to know: Euclidean space ${\mathbb R}^m$ , norms $||x||_2 = \sqrt{\sum\limits_{i=1}^m x_i^2}$ and $||x||_1 = \sum\limits_{i=1}^m |x_i|$ in ${\mathbb R}^m$ , support of vector $x \in {\mathbb R}^m$ (subset $T\subset \{1,\dots,m\}$ such that $x_i=0$ for all $i \not\in T$ ), $n \times m$ matrix, matrix multiplication, big O notation.

Background: For $x_0 \in {\mathbb R}^m$ let $y=Ax_0+e$ , where A is $n \times m$ matrix ( $n<m$ ) and $e\in {\mathbb R}^n$ . Let $x^*$ be a solution to the problem $\min\limits_{x \in {\mathbb R}^m} ||x||_1$ subject to $||Ax-y||_2 \leq \epsilon$ for some $\epsilon \geq 0$ (this is called $l1$ -regularisation problem).

The Theorem: On 3rd March 2005, Emmanuel Candes, Justin Romberg, and Terence Tao submitted to arxiv and Communications on Pure and Applied Mathematics a paper in which they proved the following result. Let A be $n \times m$ matrix ( $n<m$ ), and let S be such that $\delta_{3S}+3\delta_{4S} < 2$ . Then for any $x_0 \in {\mathbb R}^m$ supported on $T_0$ with $|T_0|\leq S$ and any $e\in {\mathbb R}^n$ such that $||e||_2 \leq \epsilon$ , we have $||x^*-x_0||_2 \leq C_S \cdot \epsilon$ , where the constant $C_S$ depends only on $\delta_{4S}$ . For example, $C_S \approx 8.82$ for $\delta_{4S}=\frac{1}{5}$ and $C_S \approx 10.47$ for $\delta_{4S}=\frac{1}{4}$ .

Short context: Vectors $x_0 \in {\mathbb R}^m$ with few non-zero entries are called sparse and may represent, for example, image or signal in various applications. Vector $y=Ax_0+e$ is the result of n measurements (given by rows of A) with error e. Given y, we then try to recover $x_0$ by solving the $l1$ -regularisation problem. The Theorem states that its solution $x^*$ is close to $x_0$ . In particular, if the measurements are exact ( $\epsilon=0$ ) then $x^* = x_0$ . The authors show that the condition $\delta_{3S}+3\delta_{4S} < 2$ in the Theorem holds for almost all matrices A with unit-normed columns if $n=O(S\log m)$ , hence, with high probability, we can recover $x_0$ after just $O(|T_0|\log m)$ random measurements. In an earlier work, Donoho proved the existence of a recovery procedure with similar properties, while the Theorem states that a concrete efficient algorithm ( $l1$ -regularisation) does the job, even for inaccurate measurements. The error $C_S \cdot \epsilon$ in the Theorem is essentially optimal.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

Sparse vectors in R^m can be approximated using O(N log m) non-adaptive measurements

You need to know: Euclidean space ${\mathbb R}^m$ , orthonormal basis in ${\mathbb R}^m$ , inner product $(x,y)=\sum\limits_{i=1}^m x_i y_i$ in ${\mathbb R}^m$ , norm $||x||_2 = \sqrt{(x,x)}$ , infimum, supremum, big O notation.

Background: Let $\psi_i, \, i=1,2,\dots,m$ be orthonormal basis in ${\mathbb R}^m$ , and let $X=\{x \in {\mathbb R}^m\,|\,\left(\sum_i|(x,\psi_i)|^p\right)^{1/p}\leq R\}$ (where $0<p \leq 1$ and $R>0$ are constants) be the set of vectors whose coefficients $(x,\psi_i)$ in this basis are not too large. Let $I_n:X \to {\mathbb R}^n$ has the form $I_n(x)=((x,\xi_1), \dots, (x,\xi_n))$ for some vectors $\xi_i \in {\mathbb R}^m$ , $A_n: {\mathbb R}^n \to {\mathbb R}^m$ be arbitrary operator, and $E(n,m,p,R) = \inf\limits_{A_n,I_n}\sup\limits_{x\in X}||x-A_n(I_n(x))||_2.$

The Theorem: On 18th September 2004, David Donoho submitted to IEEE Transactions on Information Theory a paper in which he proved that for any $0<p \leq 1$ , $A>0$ and $\gamma>1$ there is a constant $C_p=C_p(A,\gamma)$ , such that $E(n,m_n,p,R) \leq C_p R \left(\frac{n}{\log(m_n)}\right)^{\frac{1}{2}-\frac{1}{p}},$ whenever $m_n \sim A n^\gamma, \, n\to\infty$ .

Short context: Let us call vectors $x\in X$ $l_p$ -sparse. Such vectors arise in numerous applications, for example, x may represent an image (e.g. medical image) with m pixels. Vectors $\xi_i \in {\mathbb R}^m$ represent measurements, $I_n(x)$ the result of n measurements, $A_n: {\mathbb R}^n \to {\mathbb R}^m$ is the procedure of reconstruction x from measurements, $\sup\limits_{x\in X}||x-A_n(I_n(x))||_2$ is the worst-case error of this procedure, and $E(n,m,p,R)$ is the minimal worst-case error we can achieve using non-adaptive procedure, that is, the same for all x. The Theorem states that we can achieve $O(N^{\frac{1}{2}-\frac{1}{p}})$ error, where $N=\frac{n}{\log(m_n)}$ , using just $n=N\log m_n$ non-adaptive measurements. This methodology is called compressed sensing and has dramatic implications in, for example, fast reconstruction of medical images. The paper is one of the most cited (if not the most cited) mathematical paper of the 21st century.

Links: The original paper is available here.

Go to the list of all theorems

The number of relative equilibria of the Newtonian 4-body problem is finite

You need to know: Euclidean plane ${\mathbb R}^2$ , rotations, translations, and dilations in the plane, (second) derivative of a function $x:{\mathbb R}\to {\mathbb R}^2$ , uniform rotation, angular velocity.

Background: Let n point particles with masses $m_i > 0$ and positions $x_i \in {\mathbb R}^2$ are moving according to Newton’s laws of motion: $m_j \frac{d^2 x_j}{dt^2} = \sum\limits_{i \neq j} \frac{m_i m_j(x_i - x j)}{r_{ij}^3}, \, 1 \leq j \leq n$ , where $r_{ij}$ is the distance between $x_i$ and $x_j$ . A relative equilibrium motion is a solution of this system of the form $x_i(t) = R(t)x_i(0)$ where $R(t)$ is a uniform rotation with constant angular velocity $v\neq 0$ around some point $c \in {\mathbb R}^2$ . Two relative equilibria are equivalent if they are related by rotations, translations, and dilations in the plane.

The Theorem: On 12th July 2004, Marshall Hampton and Richard Meockel submitted to Inventiones mathematicae a paper in which they proved that, for $n=4$ , there is only a finite number of equivalence classes of relative equilibria, for any positive masses $m_1, m_2, m_3, m_4$ .

Short context: The problem of describing motion of n bodies under gravitation (n-body problem) in space or plane is a fundamental problem in physics and mathematics. In general, the motion can be very complicated even for $n=3$ , but can we at least classify “nice” relative equilibrium motions on the plane? This problem is solved for $n=3$ : in this case, there are always exactly five relative equilibria, up to equivalence. However, for $n\geq 4$ , the question whether the number of relative equilibria is finite is a major open problem, which was included as problem 6 into Smale’s list of problems for the 21st century. The Theorem solves this problem for $n=4$ .

Links: The original paper is available here.

Go to the list of all theorems

There is a convergent algorithm for minimising the total variation

You need to know: Matrix, norm $||u||$ of a matrix u, convergence of sequence of matrices to a limit.

Background: For $N \times N$ matrix u with entries $u_{i,j}$ define its total variation as $J(u) = \sum\limits_{i=1}^N \sum\limits_{j=1}^N \sqrt{(u_{i+1,j}-u_{i,j})^2 + (u_{i,j+1}-u_{i,j})^2},$ where $u_{N+1,j}=u_{N,j}$ and $u_{i,N+1}=u_{i,N}$ by convention. For given $N \times N$ matrix g and $\lambda>0$ , consider optimization problem $\min\limits_u \left(\frac{||u-g||}{2\lambda}+J(u)\right)$ looking for matrix u close to g but with small total variation $J(u)$ .

The Theorem: In January 2004, Antonin Chambolle published in the Journal of Mathematical Imaging and Vision a paper in which he presented an algorithms for constructing a sequence $u_n$ of matrices which is guaranteed to converge to the optimal solution $u^*$ of this problem.

Short context: The optimisation problem described above arises in various applications including image denoising, zooming, and the computation of the mean curvature motion of interfaces. The algorithm presented in the paper is guaranteed to converge and works quite fast in practical applications.

Links: The original paper is available here.

Go to the list of all theorems

The Lieb conjecture on the monotonicity of Shannon’s entropy is true

You need to know: Basic probability theory: random variable, its support, probability density function, expectation (denoted $E[.]$ ), independent random variables, identically distributed random variables.

Background: For a random variable X with probability density function f and support $S\subset{\mathbb R}$ , its Shannon entropy is $\text{Ent}(X) = -\int_S f(x) \log f(x) dx$ . Random variable X is square-integrable if $E[X^2]<\infty$ .

The Theorem: On 4th September 2003, Shiri Artstein, Keith Ball, Franck Barthe, and Assaf Naor submitted to the Journal of the AMS a paper in which they proved that, for any sequence $X_1, X_2, \dots$ of independent and identically distributed square-integrable random variables, the entropy of the normalised sum $\text{Ent} \left(\frac{X_1 + \dots + X_n}{\sqrt{n}}\right)$ is a non-decreasing function of n.

Short context: For sequence $X_1, X_2, \dots$ as above, it is known that normalised sums $Y_n = \frac{1}{\sqrt{n}}\sum\limits_{i=1}^n X_i$ converge to normal distribution. In 1949 Shannon proved that $\text{Ent}(Y_2) \geq \text{Ent}(Y_1)$ . In 1978, Lieb conjectured that in fact $\text{Ent}(Y_{n+1}) \geq \text{Ent}(Y_n)$ for all n, which could be interpreted that $Y_n$ becomes closer and closer to the normal distribution (the one with maximal entropy) at every step. This conjecture was open even for $n=2$ . The Theorems proves it in full.

Links: The original paper is available here.

Go to the list of all theorems

In Vicsek model, all agents will eventually move in the same direction

You need to know: Basic geometry (distance in the plane, vector addition), limits, connected graph.

Background: Consider n autonomous agents (points) which are moving in the plane with the same speed but with different directions. Fix constant $r>0$ . The agents are called neighbours at time t if the distance between them is at most r. At discrete time moments $t=0,1,2,\dots$ each agent’s direction is updated and become the average of its own direction and the directions of its neighbours. At any time moment t, let $G(t)$ be a graph with vertices being agents, in which neighbours connected by edges.

The Theorem: On 4th March 2002, Ali Jadbabaie, Jie Lin, and Stephen Moore submitted to IEEE Transactions on Automatic Control a paper in which they proved the following result. Assume that, in the model described above, there exists an infinite sequence of continuous, non-empty, bounded, non-intersecting time-intervals $[t_i, t'_i)$ , $i=0,1,2\dots$ , starting at $t_0 = 0$ , in which graph $G(t)$ is connected. Then in the limit all agents will eventually move in the same direction.

Short context: Coordination of groups of mobile autonomous agents in an important topic which attracted a lot of attention in the literature. It has applications in physics (e.g. the study of active matter), machine learning, etc. In 1995, Vicsek, Czirok, Ben Jacob, Cohen, and Schochet suggested the model described above, and performed a number of numerical simulations demonstrating that, after some time, the agents start moving in the same direction, despite the absence of central coordination. The Theorem provides a theoretical explanation for this observation.

Links: The original paper is available here.

Go to the list of all theorems