The circular law conjecture is true

You need to know: Notation |S| for the number of elements in finite set S, set {\mathbb C} of complex numbers, real part Re(z) and imaginary part Im(z) of a complex number z, compact subsets of {\mathbb C}, compactly supported function f:{\mathbb C}\to{\mathbb R}, matrix with complex entries, eigenvalue of a square matrix, basic probability theory, independent identically distributed (i.i.d.) complex random variables, mean, variance, probability measure \mu on {\mathbb C}, integration \int_{\mathbb C}f(z) d\mu, uniform distribution on the unit disk in {\mathbb C}.

Background: Given an n \times n matrix A, let \mu_A(x,y) := \frac{1}{n}|\{1 \leq i \leq n, \, Re(\lambda_i) \leq x, \, Im(\lambda_i) \leq y\}| be the empirical spectral distribution (ESD) of its eigenvalues \lambda_i \in {\mathbb C}, i = 1,\dots,n. We interpret \mu_A as a discrete probability measure on {\mathbb C}. Let A_n be a sequence of random matrices. For a probability measure \mu_\infty on {\mathbb C}, and continuous compactly supported function f:{\mathbb C}\to{\mathbb R}, let \Delta(\mu_\infty, f, n)=\int_{\mathbb C}f(z) d\mu_{1/\sqrt{n}A_n}(z) - \int_{\mathbb C}f(z) d\mu_\infty. We say that the ESD of \frac{1}{\sqrt{n}}A_n converges in probability to \mu_\infty if \lim\limits_{n\to\infty}P\left(|\Delta(\mu_\infty, f, n)|\geq\epsilon\right)=0 for every f and for every \epsilon>0. Also, we say that ESD of \frac{1}{\sqrt{n}}A_n converges to \mu_\infty almost surely, if, with probability 1, \Delta(\mu_\infty, f, n) converges to zero for all f.

The Theorem: On 30th July 2008, Terence Tao, Van Vu submitted to arxiv a paper in which they proved the following result. Let A_n be the n \times n random matrix whose entries are i.i.d. complex random variables with mean zero and variance one. Then, the ESD of \frac{1}{\sqrt{n}}A_n converges (both in probability and in the almost sure sense) to the uniform distribution on the unit disk.

Short context: The statement of the Theorem was known as the circular law conjecture before 2010, and now is called the circular law. It was first established by Ginibre in 1965 for random matrices with Gaussian distribution of entries. Then many authors proved it for more and more general distributions, culminating in the Theorem that establishes it in the general case.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

The cost L_n of the mean-field TSP on n-vertex graph converges to 2.04…

You need to know: Graph, complete graph, limits, integration, notation P for probability, random variable, distribution.

Background: In a complete graph on n vertices, let each edge e be assigned independent random cost C_e from a fixed distribution \mu on the non-negative real numbers, such that \lim\limits_{t \to 0^+} \frac{P(C_e<t)}{t} = 1. Let L_n be the minimum sum of the edge costs of a cycle that visits each vertex precisely once. For any x>0, let y=y(x) be the unique real number such that \left(1+\frac{x}{2}\right)e^{-x} + \left(1+\frac{y}{2}\right)e^{-y}=1. Let L^*=\frac{1}{2}\int_0^\infty y(x)dx = 2.0415....

The Theorem: On 3rd April 2008, Johan Wästlund submitted to Acta Mathematica a paper in which he proved that for any \epsilon>0, \lim\limits_{n\to \infty}P\left[\left|L_n - L^*\right|>\epsilon\right] = 0.

Short context: Computing the minimum sum of the edge costs of a cycle that visits each vertex of a graph is a famous and important problem known as travelling salesman problem (TSP). A model in which the edge costs are random, chosen independently from a fixed distribution, is known as the mean field model of distance. The Theorem finds the limit of the optimal cost of the mean-field TSP for a general class of distributions, which contains, for example, uniform distribution on [0, 1] or exponential with mean 1.

Links: The original paper is available here.

Go to the list of all theorems

First moment concentration of log-concave measures implies exponential concentration

You need to know: Euclidean space {\mathbb R}^n, norm \|x\| of x\in{\mathbb R}^n, convex set D\subset {\mathbb R}^n, convex function on D, probability measure \mu on {\mathbb R}^n, integration with respect to \mu.

Background: A probability measure \mu on {\mathbb R}^n is called absolutely continuous if there exists function g:{\mathbb R}^n \to {\mathbb R} such that \mu(A)=\int_A g(x)dx for every measurable A\subset {\mathbb R}^n. If set D=\{x\in{\mathbb R}^n\,|\,g(x)>0\} is convex, and -\log(g(x)) is a convex function on D, then \mu is called log-concave. Given a function f:{\mathbb R}^n \to {\mathbb R}, we write E_\mu(f)=\int_{{\mathbb R}^n}f d\mu, and ||f||_{L^1(\mu)} = E_\mu(|f|). Function f is called 1-Lipschitz if |f(x)-f(y)| \leq \|x-y\| for all x,y \in {\mathbb R}^n.  We say that \mu satisfies First-Moment concentration if there exists D>0 such that for every 1-Lipschitz f we have ||f-E_\mu(f)||_{L^1(\mu)} \leq \frac{1}{D}.

The Theorem: On 26th December 2007, Emanuel Milman submitted to arxiv a paper in which he proved that for every absolutely continuous log-concave probability measure \mu on {\mathbb R}^n satisfying First-Moment concentration, and every 1-Lipschitz f:{\mathbb R}^n \to {\mathbb R}, inequality \mu(|f-E_\mu(f)| \geq t) \leq c \exp(-CDt) holds for all t>0, where c and C are universal constants.

Short context: Let x\in{\mathbb R}^n be selected at random with respect to measure \mu, and f be some useful function we want to compute. In general, because x is random, f(x) is also random and unpredictable. However, if the conclusion of the theorem (known as exponential concentration) holds, then f(x) \approx E_\mu(f) with very high probability. This property looks much stronger than “just” first moment concentration. The Theorem states, however, that these concentrations are in fact equivalent! In fact, in the same paper authors studied several other notions of concentrations, some looks very week and some very strong, and proved that they are all equivalent for the log-concave \mu.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

The norm of the inverse of n by n matrix with i.i.d. subgaussian entries is at most n^0.5/e with probability 1-Ce-c^n

You need to know: Probability (denoted by P), mean, variance, fourth moment, abbreviation i.i.d. for independent identically distributed random variables, Euclidean space {\mathbb R}^n, notation |x| for length of x \in {\mathbb R}^n, matrix, matrix multiplication, norm ||A||=\sup\limits_{x \in {\mathbb R}^n}\frac{|Ax|}{|x|} of n \times n matrix A, inverse matrix A^{-1}.

Background: For integer n>0, let A_{n,X} be an n \times n matrix whose entries a_{ij} are i.i.d. with the same distribution as X. A random variable X is called subgaussian if there exists constant B such that P(|X|>t) \leq 2e^{-t^2/B^2} for all t>0. The minimal B such that this holds is called the subgaussian moment of X. We will assume convention \|A_{n,X}^{-1}\| = \infty if A_{n,X}^{-1} does not exist.

The Theorem: On 16th March 2007, Mark Rudelson and Roman Vershynin submitted to arxiv a paper in which they proved that for any B>0 there exist constants C=C(B)>0 and c=c(B)\in(0,1) such that for any subgaussian random variable X with mean 0, variance 1, and subgaussian moment at most B, any integer n>0, and any \epsilon \geq 0, the probability that \|A_{n,X}^{-1}\| \geq  \sqrt{n}/\epsilon is at most C\epsilon+c^n.

Short context: In 1963, von Neumann predicted that \|A_{n,X}^{-1}\| is proportional to \sqrt{n} for random matrices with high probability. Before 2007, this was open even if X takes values \pm 1 with equal chances (in which case A_{n,X} is called Bernoulli matrix). For Bernoulli matrices, Spielman and Teng made in 2002 a stronger conjecture that P(\|A_{n,X}^{-1}\| \geq  \sqrt{n}/\epsilon) \leq C\epsilon+c^n. The Theorem proves this conjecture for all matrices with i.i.d. subgaussian entries. Earlier, it was only known that P(\|A_{n,X}^{-1}\| \geq  n^{3/2}/\epsilon) is small. Also, the special B=C+\frac{1}{2} case of the Theorem implies this previous result of Tao and Vu. In addition, Rudelson and Vershynin confirmed the von Neumann prediction assuming only that fourth moment of X is bounded.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

The variance of the current across the characteristic in ASEP is of order t^(2/3)

You need to know: Set {\mathbb Z} of integers, probability, independent random variables, exponential distribution, expectation, variance. For x\in{\mathbb R}, denote [x] the largest integer in [0, x] if x \geq 0, and the smallest integer in [x, 0] if x\leq 0.

Background: At time t=0, let us put, for each x\in {\mathbb Z}, a particle in site x, independently and with the same probability \sigma. Then, for each particle z, independently generate a random variable t_z from exponential distribution with parameter 1, wait for time t_z, and then jump to an adjacent site right or left, with probabilities p and q=1-p, respectively, provided that the target site is unoccupied (and otherwise stay). This action is then repeated and performed in parallel for all particles. We call this asymmetric simple exclusion process (ASEP). For v\in{\mathbb R}, let J^{(v)}(t) = J^{(v)}_+(t) - J^{(v)}_-(t), where J^{(v)}_+(t) is the number of particles that began in (-\infty,0] at time 0 but lie in [[vt]+1,\infty) at time t, and J^{(v)}_-(t) is the number of particles that began in [1, \infty) at time 0 but lie in (-\infty,[vt]] at time t.

The Theorem: On 15st August 2006, Márton Balázs and Timo Seppäläinen submitted to arxiv a paper in which they proved that, for v=(p-q)(1-2\sigma), there exists positive constants t_0 and C, such that C^{-1}t^{2/3} \leq \text{Var}(J^{(v)}(t)) \leq Ct^{2/3} , for all t\geq t_0.

Short context: ASEP is one of the simplest models for diffusion processes – see here for its the 2-dimensional version. Random variable J^{(v)}(t) has the meaning of the total net particle current seen by an observer moving at speed v during time interval [0,t]. In 1994, Ferrari and Fontes proved that \lim\limits_{t \to \infty} \frac{\text{Var}(J^v(t))}{t} = \sigma(1-\sigma) |(p-q)(1-2\sigma)-v|, which implies that the variance grows linearly in time, except when v=(p-q)(1-2\sigma), which is called the characteristic speed. J^{(v)}(t) for this v is called the current across the characteristic, and The Theorem proves that its variance grows as t^{2/3}.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 10.5 of this book for an accessible description of the Theorem.

Go to the list of all theorems

The set of n=d exp(cd) i.i.d Gaussian points in R^d is linearly separable

You need to know: Basic probability theory, random vector in {\mathbb R}^d, independent and identically distributed (i.i.d.) random vectors, Gaussian distribution in {\mathbb R}^d, covariance matrix, non-singular matrix, overwhelming probability, hyperplane in {\mathbb R}^d.

Background: Let X be a random point cloud consisting of n points x_i, i= 1,\dots,n, sampled independently and identically from a Gaussian distribution in {\mathbb R}^d with non-singular covariance matrix. Set X is called linearly separable (or 1-convex) if every point in it can be separated from other points by a hyperplane.

The Theorem: On 5th July 2006, David Donoho and Jared Tanner submitted to the Journal of the AMS a paper in which they proved, among other results, that for any \epsilon>0, with overwhelming probability for large d<n, each subset of X of size at most \frac{d}{2e \log(n/d)}(1-\epsilon) can be separated from other points of X by a hyperplane.

Short context: If you select n random points on the plane, then, if n is not too small, you will likely to find a point which is in the convex hull of the other points, and therefore cannot be separated from them by a line. The theorem states that (at least for Gaussian distribution), the situation in high dimension is dramatically different: with very high probability, every group of \frac{d}{2e \log(n/d)}(1-\epsilon) points can be separated from others by a hyperplane. In particular, set of n points is linearly separable if \frac{d}{2e \log(n/d)}(1-\epsilon)>1, or n < d \exp(cd) for some c>0. The Theorem has many applications, to, for example, developing efficient error-correction mechanisms, sparse signal recovery, etc.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

 

A version of the central limit theorem holds for convex sets

You need to know: Probability, random vector, uniform distribution, Euclidean space {\mathbb R}^n, inner product (x,y)=\sum\limits_{i=1}^n x_i y_i in {\mathbb R}^n, unit vector in {\mathbb R}^n, integration in {\mathbb R}^n, measurable sets, convex set in {\mathbb R}^n, compact set in {\mathbb R}^n, interior of a set, supremum.

Background: A convex body in {\mathbb R}^n is a compact, convex set K\subset{\mathbb R}^n with a non-empty interior.

The Theorem: On 29th April 2006, Boáz Klartag submitted to arxiv a paper in which he proved the existence of a sequence \epsilon_n converging to 0 for which the following holds: For any convex body K\subset{\mathbb R}^n, there exist a unit vector \theta in {\mathbb R}^n, t_0\in {\mathbb R}, and \sigma>0 such that \sup\limits_{A\subset {\mathbb R}}\left|\text{Prob}\{(X,\theta)\in A\}-\frac{1}{\sqrt{2\pi\sigma}}\int_A e^{-\frac{(t-t_0)^2}{2\sigma^2}}dt\right|\leq \epsilon_n, where X is a random vector uniformly distributed in K, and the supremum runs over all measurable sets A\subset {\mathbb R}.

Short context: Let X be a random vector that is distributed uniformly in a cube in {\mathbb R}^n with centre in the origin. Then components X_1, X_2, \dots, X_n of X are independent and identically distributed, and, by the classical central limit theorem, \frac{1}{\sqrt{n}}\sum\limits_{i=1}^n X_i is close to a normal distribution if n is large. Moreover, the same is true for (X,\theta), under some mild conditions on unit vector \theta. The Theorem states that a similar result holds if X is uniformly distributed in any convex body in {\mathbb R}^n. This is surprising because in this case the components X_i may be far from being independent.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

The norm of the inverse of n by n matrix with i.i.d. Bernoulli entries is polynomially bounded with probability 1-1/n^C, for any C>0

You need to know: Probability (denoted by P), abbreviation i.i.d. for independent identically distributed random variables, Euclidean space {\mathbb R}^n, notation |x| for length of x \in {\mathbb R}^n, matrix multiplication, inverse matrix, norm ||A||=\sup\limits_{x \in {\mathbb R}^n}\frac{|Ax|}{|x|} of n \times n matrix A.

Background: For integer n>0, let A_n be an n \times n matrix whose entries a_{ij} are i.i.d. Bernoulli random variables (taking values +1 and -1 with probability 1/2 each).

The Theorem: On 8th November 2005, Terence Tao and Van Vu submitted to arxiv a paper in which they proved that for any constant C>0, there are constants B and N such that for any n\geq N the inequality \|A_n^{-1}\| \leq n^B holds with probability at least 1-n^{-C}.

Short context: It is known that random Bernoulli matrix A_n is invertible with probability exponentially close to 1. The Theorem states that the norm \|A^{-1}\| of its inverse matrix is polynomially bounded with probability at least 1-n^{-C} for any C>0. Previously, this was known only for C\leq 1/2.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 9.4 of this book for an accessible description of the Theorem.

Go to the list of all theorems

The norm of the inverse of n by n matrix with i.i.d. subgaussian entries is at most Cn^1.5/e with probability 1-e, if e>c/n^0.5

You need to know: Probability (denoted by P), mean, variance, normal distribution, abbreviation i.i.d. for independent identically distributed random variables, Euclidean space {\mathbb R}^n, notation |x| for length of x \in {\mathbb R}^n, matrix, matrix multiplication, inverse matrix, norm ||A||=\sup\limits_{x \in {\mathbb R}^n}\frac{|Ax|}{|x|} of n \times n matrix A.

Background: A random variable X is called subgaussian if there exist constants
B and b such that P(|X|>t) \leq Be^{-bt^2} for all t>0. For integer n>0, let A_{n,X} be an n \times n matrix whose entries a_{ij} are i.i.d. with the same distribution as X.

The Theorem: On 30th June 2005, Mark Rudelson submitted to the Annals of Mathematics a paper in which he proved the following result. Let X be a subgaussian random variable with mean 0 and variance 1. Then there are positive constants C_1, C_2, and C_3 such that for any integer n>0, and any \epsilon > C_1/\sqrt{n}, the inequality \|A_{n,X}^{-1}\| \leq C_2\cdot n^{3/2}/\epsilon holds with probability greater than 1-\epsilon/2-4e^{-C_3n}.

Short context: As a special case, the Theorem applies to matrices whose i.i.d. entries are equal to \pm 1 with equal chances (Bernoulli matrices). It is known that such random matrix A is invertible with probability exponentially close to 1. The Theorem provides a bound for the norm \|A^{-1}\| of the inverse matrix. The bound is polynomial in n and holds with probability greater than 1-\epsilon if n is large enough. Previously, similar bound was known only if i.i.d. entries a_{ij} of A follow the normal distribution.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 8.11 of this book for an accessible description of the Theorem.

Go to the list of all theorems

The Dantzig selector recovers sparse vectors in R^m from n<<m noisy measurements

You need to know: Euclidean space {\mathbb R}^m with scalar product (x,y) = \sum\limits_{i=1}^m x_iy_i, norms ||x||_2 = \sqrt{(x,x)}, ||x||_1 = \sum\limits_{i=1}^m |x_i|, and ||x||_\infty = \max\limits_{1\leq i \leq m}|x_i| in {\mathbb R}^m, support of vector x \in {\mathbb R}^m (subset T\subset \{1,\dots,m\} such that x_i=0 for all i \not\in T), n \times m matrix, matrix multiplication, notation A^* for the transpose of matrix A, big O notation, linear programming, basic probability, mean, variance, independent identically distributed (i.i.d.) random variables, normal distribution N(\mu, \sigma^2) with mean \mu and variance \sigma^2.

Background: A vector x_0 \in {\mathbb R}^m is called S-sparse if it is supported on T_0 with |T_0|\leq S. For x_0 \in {\mathbb R}^m let y=Ax_0+e, where A is n \times m matrix (n<m) and e\in {\mathbb R}^n is a random error (noise) whose components e_i are i.i.d. and follow N(0, \sigma^2). The Dantzig selector x^* is a solution to the problem \min\limits_{x \in {\mathbb R}^m} ||x||_1 subject to ||A^*(y-Ax)||_{\infty} \leq \lambda_m \cdot \sigma, where \lambda_m>0 is a constant.

For T\subset \{1,\dots,m\}, let A_T be n \times |T| submatrix of A formed from the columns of A corresponding to the indices in T. For S>0, let \delta_S be the smallest number such that inequalities (1-\delta_S)||c||_2^2 \leq ||A_Tc||_2^2\leq (1+\delta_S)||c||_2^2 hold for all T with |T|\leq S and all c\in{\mathbb R}^{|T|}. Also, for S,S'>0, such that S+S'\leq m, let \theta_{S,S'} be the smallest number such that inequality |(A_Tc, A_{T'}c')|\leq \theta_{S,S'}||c||_2 ||c'||_2 holds for all disjoint sets T,T'\subset \{1,\dots,m\} with |T|\leq S and |T'|\leq S' and all c\in{\mathbb R}^{|T|}, c'\in{\mathbb R}^{|T'|}.

The Theorem: On 5th June 2005, Emmanuel Candes and Terence Tao submitted to arxiv a paper in which they proved the following result. Let A be n \times m matrix (n<m), S be such that \delta_{2S}+\theta_{S,2S} < 1, x_0 \in {\mathbb R}^m be S-sparse, and x^*\in {\mathbb R}^m be the Dantzig selector with \lambda_m=\sqrt{2(1+a)\log m} for some a>0. Then inequality ||x^*-x_0||_2 \leq C \lambda_m\sqrt{S}\sigma, where  C=4/(1-\delta_{2S}-\theta_{S,2S}), holds with probability exceeding 1-\left(\sqrt{\pi\log m}\cdot m^a\right)^{-1}.

Short context: In the previous paper, Candes, Romberg, and Tao developed an efficient procedure witch uses the results of a low number (n=O(S\log m) random measurements suffices) of measurements with arbitrary (random or non-random) error e, to recover S-sparse vector x_0 \in {\mathbb R}^m with error O(||e||_2). With arbitrary e, this result is optimal. The Theorem, however, states that with random measurement error e (which often the case in the applications) we can recover x_0 with even higher accuracy O(\sqrt{\log m}\sigma) with high probability using the Dantzig selector x^*, which can be computed using linear programming. The error in the Theorem is optimal up to the constant and \sqrt{\log m} factors.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems