The norm of the inverse of n by n matrix with i.i.d. subgaussian entries is at most Cn^1.5/e with probability 1-e, if e>c/n^0.5

You need to know: Probability (denoted by P), mean, variance, normal distribution, abbreviation i.i.d. for independent identically distributed random variables, Euclidean space {\mathbb R}^n, notation |x| for length of x \in {\mathbb R}^n, matrix, matrix multiplication, inverse matrix, norm ||A||=\sup\limits_{x \in {\mathbb R}^n}\frac{|Ax|}{|x|} of n \times n matrix A.

Background: A random variable X is called subgaussian if there exist constants
B and b such that P(|X|>t) \leq Be^{-bt^2} for all t>0. For integer n>0, let A_{n,X} be an n \times n matrix whose entries a_{ij} are i.i.d. with the same distribution as X.

The Theorem: On 30th June 2005, Mark Rudelson submitted to the Annals of Mathematics a paper in which he proved the following result. Let X be a subgaussian random variable with mean 0 and variance 1. Then there are positive constants C_1, C_2, and C_3 such that for any integer n>0, and any \epsilon > C_1/\sqrt{n}, the inequality \|A_{n,X}^{-1}\| \leq C_2\cdot n^{3/2}/\epsilon holds with probability greater than 1-\epsilon/2-4e^{-C_3n}.

Short context: As a special case, the Theorem applies to matrices whose i.i.d. entries are equal to \pm 1 with equal chances (Bernoulli matrices). It is known that such random matrix A is invertible with probability exponentially close to 1. The Theorem provides a bound for the norm \|A^{-1}\| of the inverse matrix. The bound is polynomial in n and holds with probability greater than 1-\epsilon if n is large enough. Previously, similar bound was known only if i.i.d. entries a_{ij} of A follow the normal distribution.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 8.11 of this book for an accessible description of the Theorem.

Go to the list of all theorems

For p, q > 0, L^p embeds uniformly into L^q if and only if p<=q or q<=p<=2

You need to know: Metric space (M,d_M) with distance d_M, integration, L^p space for p>0 (space of measurable functions such that \int|f|^p dx < \infty), injective function, inverse f^{-1} of function f.

Background: Let (M,d_M) and (N,d_N) be metric spaces. A function f:M \to N is called uniformly continuous if for every \epsilon>0 there exists a \delta>0 such that d_N(f(x),f(y))<\epsilon for every x,y \in M with d_M(x,y)<\delta. We say that M embeds uniformly into N if there is an injective function f:M \to N such that both f and f^{-1} are uniformly continuous.

The Theorem: On 10th June 2005, Manor Mendel, Assaf Naor submitted to arxiv a paper in which they proved, among other results, that for p, q > 0, L^p embeds uniformly into L^q if and only if p \leq q or q \leq p \leq 2.

Short context: There is a large body of literature studying which metric spaces embeds uniformly into which. However, before 2005, this problem was not solved even for L^p spaces. The Theorem provides a complete answer to this question. In fact, Mendel and Naor developed a deep and general theory of “metric cotype”, and the Theorem is just one out of many corollaries from this theory.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 8.9 of this book for an accessible description of the Theorem.

Go to the list of all theorems

(1,p)-Poincaré inequality implies (1,p-e)-Poincaré inequality for some e>0

You need to know: Euclidean space {\mathbb R}^n, convergence in {\mathbb R}^n with norm ||x||=\sqrt{\sum\limits_{i=1}^n}x_i^2, limit superior \limsup, ball B(x,r)=\{y\in {\mathbb R}^n  :  ||y-x||<r\} with centre x and radius r>0, integration in {\mathbb R}^n, Lipschitz continuous function f:{\mathbb R}^n \to {\mathbb R}.

Background: For a function w:{\mathbb R}^n \to {\mathbb R} (which we call a weight function) denote |S|_w = \int_S w(x)dx the “weighted volume” of set S\subset {\mathbb R}^n. For a function f:{\mathbb R}^n \to {\mathbb R} denote f_{S,w}=\frac{1}{|S|_w}\int_S f(x)w(x)dx its weighed average on S, and \Delta(f,S,w)=\frac{1}{|S|_w}\int_S|f(x)-f_{S,w}|w(x)dx its weighted deviation from the average. We say that weight function w:{\mathbb R}^n \to {\mathbb R} is p-admissible, p \geq 1, if (i) there is a constant C \geq 1 such that |B(x,2r)|_w \leq C |B(x,r)|_w holds for all x \in {\mathbb R}^n and r>0, (ii) 0 < |B|_w < \infty for every ball B, and (iii) there are constants C' \geq 1 and 0 < t \leq 1 such that the inequality \Delta(f,B(x_0,tr),w) \leq 2rC' \left(\frac{1}{|B|_w}\int_B (\text{Lip} f(x))^p w(x) dx \right)^{1/p} holds for all balls B=B(x_0,r), and for every Lipschitz continuous function f:{\mathbb R}^n \to {\mathbb R}, where \text{Lip}f(x) = \limsup\limits_{y\to x}\frac{|f(x)-f(y)|}{||x-y||} (this is called the (1, p)-Poincaré inequality).

The Theorem: On 8th June 2005, Stephen Keith and Xiao Zhong submitted to the Annals of Mathematics a paper in which they proved that if w:{\mathbb R}^n \to {\mathbb R} is a p-admissible weight for some p>1, then there exists an \epsilon > 0 such that w is q-admissible for every q > p-\epsilon.

Short context: (1, p)-Poincaré inequality controls the size of weighted deviations of Lipschitz continuous functions f in terms of \text{Lip}f(x), and the smaller p the better control. The Theorem, however, implies that we cannot have the smallest p^*>1 such that it holds: the inequality either holds for all p\geq 1, or for values of p from some open interval.

Links: The original paper is available here. See also Section 8.3 of this book for an accessible description of the Theorem.

Go to the list of all theorems

Nevanlinna characteristics T(r,f(z + z0)) grows as T(r,f(z)) for finite order meromorphic functions f

You need to know: Complex numbers, notation |z| for the absolute value of a complex number z, function in a complex variable, meromorphic function f, poles of f and their multiplicity, integration, notation \log^+ x = \max(\log x, 0), \limsup notation, big O notation, notation f(r) \sim g(r) if \lim\limits_{r \to \infty}\frac{f(r)}{g(r)}=1.

Background: For a meromorphic function f and real r \geq 0, let n(r,f) be the number of poles z_i of f, counting multiplicity, such that |z_i|\leq r. The Nevanlinna characteristic T(r,f) of f is T(r,f)=m(r,f)+N(r,f), where m(r,f) = \frac{1}{2\pi}\int_0^{2\pi}\log^+|f(re^{i\theta})|d\theta and N(r,f)=\int_0^r(n(t,f)-n(0,f))\frac{dt}{t}+n(0,f)\log r. The order of a meromorphic function f is \sigma(f)=\limsup\limits_{r\to\infty}\frac{\log^+ T(r,f)}{\log r}.

The Theorem: On 6th May 2005, Yik-Man Chiang and Shao-Ji Feng submitted to The Ramanujan Journal a paper in which they proved that for every meromorphic function f of order \sigma(f) < \infty, any fixed complex number \eta\neq 0, and any \epsilon>0, we have T(r,f(z + \eta)) = T(r,f) + O(r^{\sigma(f)-1+\epsilon}) + O(\log r).

Short context: Nevanlinna characteristic T(r,f) measures the rate of growth of a meromorphic function f, and can be used to describe the asymptotic distribution of solutions of the equation f(z)=a as a varies. The Theorem implies that T(r,f(z + \eta)) \sim T(r,f) for finite order meromorphic functions. The authors demonstrate various applications of this result to, for example, difference equations.

Links: Free arxiv version of the original paper is here, journal version is here.

Go to the list of all theorems

The sequence of perfect squares is L1-universally bad

You need to know: Probability space X with measure \mu and set of measurable subsets {\cal X}, the notion of “almost all” x\in X, notation T^n(x) for n-fold composiition T(T(\dots T(x))\dots) of map T: X \to X, notation T^{-1}(A):=\{x \in X\,|\, T(x) \in A\} for the preimage of A \subset X, integration on X, notation L^p(X) for set of functions f:X \to {\mathbb R} such that \int_X|f(x)|^pd\mu < \infty, notation \{n_k\}_{k=1}^\infty for sequence n_1, n_2, \dots, n_k, \dots.

Background: An ergodic dynamical system (X,T) is a probability space X, together with map T: X \to X such that (i) \mu (T^{-1}(A)) = \mu(A) for all A \in {\cal X}, and (ii) for any A \in {\cal X} with T^{-1}(A)=A, either \mu(A)=0 or \mu(A)=1. A sequence \{n_k\}_{k=1}^\infty is called L1-universally bad if for all ergodic dynamical systems (X,T) there is some f \in L^1(X) and A \in {\cal X} with \mu(A)>0, such that the limit \lim\limits_{N\to\infty}\frac{1}{N}\sum\limits_{k=1}^N f(T^{n_k}(x)) fails to exist for all x\in A.

The Theorem: On 5th April 2005, Zoltán Buczolich and Daniel Mauldin submitted to arxiv and The Annals of Mathematics a paper in which they proved that the sequence \{k^2\}_{k=1}^\infty is L1-universally bad.

Short context: Famous Birkhoff’s Ergodic Theorem states that, for any ergodic dynamical system (X,T) and any f \in L^1(X) the limit \lim\limits_{N\to\infty}\frac{1}{N}\sum\limits_{k=1}^N f(T^k(x)) exists for almost all x\in X (in fact, this limit is equal to \int_X f(x) d\mu). An important research direction is to understand for which sequences \{n_k\}_{k=1}^\infty the corresponding limit \lim\limits_{N\to\infty}\frac{1}{N}\sum\limits_{k=1}^N f(T^{n_k}(x)) exists for almost all x\in X. In 1988, Bourgain proved this for sequence \{k^2\}_{k=1}^\infty, provided that f \in L^p(X) for some p>1, and asked if the same is true for all f \in L^1(X). The Theorem answers this question negatively.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 10.6 of this book for an accessible description of the Theorem.

Go to the list of all theorems

The spectrum of the almost Mathieu operator is a Cantor set for all irrational frequencies

You need to know: Set {\mathbb Z} of integers, infinite sum \sum\limits_{n \in {\mathbb Z}} x_n, Cantor set.

Background: Let l^2(\mathbb Z) be the set of all infinite sequences x=(\dots, x_{-1}, x_0, x_1, \dots) such that \sum\limits_{n \in {\mathbb Z}} x_n^2 < \infty. A map T:l^2(\mathbb Z) \to l^2(\mathbb Z) is called invertible if for every y \in l^2(\mathbb Z) there exists a unique x \in l^2(\mathbb Z) such that T(x)=y. The almost Mathieu operator is the map H:l^2(\mathbb Z) \to l^2(\mathbb Z) mapping each x \in l^2(\mathbb Z) into (Hx)_n = x_{n+1} + x_{n-1} + 2 \lambda \cos 2\pi (\theta + n \alpha) x_n, \, n\in{\mathbb Z}, where \lambda\neq 0, \alpha, and \theta are real parameters, called coupling, frequency, and phase, respectively. The set of all t\in{\mathbb R} for which map H_t:l^2(\mathbb Z) \to l^2(\mathbb Z) given by (H_tx)_n = t x_n - (Hx)_n, n\in{\mathbb Z}, is not invertible is called the spectrum of H.

The Theorem: On 17th March 2005, Artur Avila and Svetlana Jitomirskaya submitted to arxiv a paper in which they proved that the spectrum of the almost Mathieu operator is a Cantor set for all irrational \alpha and for all \theta and all \lambda \neq 0.

Short context: The almost Mathieu operator and its spectrum arise from applications in physics. The Theorem confirms the conjecture proposed by Azbel in 1964. In 1981, Mark Kac offered ten martinis for anyone who could prove or disprove it, and since then the problem has been known as “The Ten Martini Problem”.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 9.6 of this book for an accessible description of the Theorem.

Go to the list of all theorems

Any real polynomial can be approximated by hyperbolic real polynomials of the same degree

You need to know: Limit, derivative, polynomial, degree of a polynomial.

Background: The n-th functional power of a function f:{\mathbb R}\to{\mathbb R} is a function f^n(x)=f(f(\dots f(x)\dots)), where f is repeated n times. A point x_0 is called periodic for f if f^k(x_0)=x_0 for some k\geq 1. The smallest k for which this holds is called period of x_0. A periodic point x_0 with period k is called hyperbolic if |(f^k)'(x_0)| \neq 1, and it is called hyperbolic attracting if |(f^k)'(x_0)|<1. For a polynomial f:{\mathbb R}\to{\mathbb R}, and x_0 \in {\mathbb R}, either (i) \lim\limits_{n \to \infty} |f^n(x_0) - f^n(x^*)|=0 for some hyperbolic attracting periodic point x^*, or (ii) \lim\limits_{n \to \infty} |f^n(x_0)|=\infty, or (iii) neither (i) nor (ii) happens. Let S_f be the set of all x_0 \in {\mathbb R} for which case (iii) holds. A polynomial f is called hyperbolic if there exist constants C>0 and \lambda>1 such that |(f^n)'(x)|>C \lambda^n, \forall n, \forall x \in S_f.

The Theorem: On 6th August 2004, Oleg Kozlovski, Weixiao Shen, and Sebastian van Strien submitted to the Annals of Mathematics a paper in which they proved that for every real polynomial f(x)=\sum\limits_{i=0}^d a_ix^i and any \epsilon>0, there exists a hyperbolic real polynomial h(x)=\sum\limits_{i=0}^d b_i x^i such that |a_i-b_i|<\epsilon, i=0,1,\dots,d.

Short context: Hyperbolic polynomials are central objects of study in dynamical systems, and the Theorem solves one of the central problems in this area. Because every “sufficiently smooth” function g can be approximated by polynomials, the Theorem implies that g can be approximated by hyperbolic polynomials, resolving the second part of problem 11 from Smale’s list of problems for the 21st century.

Links: The original paper is available here. See also Section 7.6 of this book for an accessible description of the Theorem.

Go to the list of all theorems

Every positive regular solution of the integral equation posed by Lieb is radially symmetric and monotone

You need to know: Euclidean space {\mathbb R}^n, integration over {\mathbb R}^n, space of functions L^p({\mathbb R}^n) (functions u:{\mathbb R}^n\to{\mathbb R} such that \int_{{\mathbb R}^n}|u|^p < \infty).

Background: For positive integer n and 0 < \alpha < n, consider the integral equation u(x) = \int_{{\mathbb R}^n}\frac{1}{|x-y|^{n-\alpha}}u(y)^{\frac{n+\alpha}{n-\alpha}}dy. We call its solution u regular if u\in L^{\frac{2n}{n-\alpha}}({\mathbb R}^n).

The Theorem: In August 2004, Wenxiong Chen, Congming Li, and Biao Ou submitted to the Communications on Pure and Applied Mathematics a paper in which they proved that every positive regular solution of the integral equation above has the form u(x) = c\left(\frac{t}{t^2+|x-x_0|^2}\right)^{\frac{n-\alpha}{2}}, with some constant c = c(n, \alpha) and some t > 0 and x_0 \in {\mathbb R}^n.

Short context: Integral equation above arose in 1983 paper of Lieb on best possible constant in so-called Hardy-Littlewood-Sobolev inequality. It also has connection with a well-known family of semilinear partial differential equations. Lieb posed the classification of all the solutions of this integral equation as an open problem. This problem was open for over 20 years, until was fully solved by the Theorem.

Links: The original paper is available here.

Go to the list of all theorems

Typical interval exchange transformation is either rotation or weakly mixing

You need to know: Permutation of \{1,2, \dots, d\}, notation {\mathbb R}_+^d for vectors in {\mathbb R}_d with non-negative components, notation f^k(x)=f(f(\dots f(x) \dots)) for k-fold function composition,  notation f^{-k}(A) for set \{x: f^k(x) \in A\}, notation a \equiv b (\text{mod } d) if a-b is divisible by d, Lebesgue measure, measurable sets, Lebesgue almost every.

Background: Let d \geq 2 be an integer. Permutation \pi of \{1,2, \dots, d\} is called irreducible if \pi (\{1,\dots,k\}) \neq \{1,\dots,k\} for all 1 \leq k<d. Given such \pi and \lambda = (\lambda_1, \dots, \lambda_d)\in {\mathbb R}_+^d, an interval exchange transformation f = f(\lambda, \pi) is a map f:I \to I, which divides I = \left[0,\sum\limits_{i=1}^d \lambda_i\right) into sub-intervals I_i = \left[\sum\limits_{j<i} \lambda_j,\sum\limits_{j \leq i} \lambda_j\right), \, i=1,2\dots,d and rearranges the I_i according to \pi (it maps every x \in I_i into x + \sum\limits_{\pi(j)<\pi(i)} \lambda_j - \sum\limits_{j<i} \lambda_j). f is called weakly mixing if for every pair of measurable sets A, B \subset I, \lim\limits_{n\to\infty} \frac{1}{n}\sum\limits_{k=1}^{n-1} \left|m(f^{-k}(A) \cap B) -m(A)m(B)\right| = 0, where m denotes the Lebesgue measure. Permutation \pi of \{1,2, \dots, d\} is called a rotation if \pi(i + 1) \equiv \pi(i) + 1 (\text{mod } d), for all i \in \{1,2, \dots, d\}.

The Theorem: On 16th June 2004, Artur Ávila and Giovanni Forni submitted to arxiv a paper in which they proved that for every irreducible permutation \pi of \{1,2, \dots, d\} which is not a rotation, and Lebesgue almost every \lambda\in {\mathbb R}_+^d, f(\lambda, \pi) is weakly mixing.

Short context: Interval exchange transformations (IETs in short) are basic examples of measure-preserving transformations f:I \to I (that is, such that m(f^{-1}(A))=m(A) for all measurable A \subset I), which are central objects of study in dynamical systems. f is called mixing if \lim\limits_{n\to\infty} m(f^{-n}(A) \cap B) = m(A)m(B) for all measurable A, B \subset I, and ergodic if f^{-1}(A)=A implies that m(A)=0 or m(A)=m(I). It is known that every mixing f is weakly mixing, and every weakly mixing f is ergodic. It was known that almost every IET is ergodic, and the Theorem proves a stronger result that almost every non-rotation IET is weakly mixing. Because, by 1980 theorem of Katok, IETs are not mixing, the weak mixing property is the strongest we could hope for.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 7.4 of this book for an accessible description of the Theorem.

Go to the list of all theorems

The metric entropy is equivalent to the combinatorial dimension under minimal regularity

You need to know: Logarithm, supremum.

Background: Let \Omega be a set, {\cal A} be the set of all functions f:\Omega \to {\mathbb R}, and let A be any subset of {\cal A}. For k points x_1, x_2, \dots, x_k \in \Omega and two functions f and g in {\cal A}, define d_{x_1,\dots x_k}(f,g)=\sqrt{\frac{1}{k}\sum\limits_{i=1}^k(f(x_i)-g(x_i))^2}. For any t>0, let N_{x_1,\dots x_k}(A,t) be the maximal n for which there exists functions f_1,f_2,\dots f_n \in A such that d_{x_1,\dots x_k}(f_i,f_j) \geq t for all i \neq j. The quantity D(A,t) := \log\left( \sup\limits_{k} \sup\limits_{x_1,\dots x_k} N_{x_1,\dots x_k}(A,t)\right) is called the Koltchinskii–Pollard entropy, or just metric entropy of A.

We say that a subset \sigma of \Omega is t-shattered by A if there exists a function h on \sigma such that, given any decomposition \sigma=\sigma_1 \cup \sigma_2 with \sigma_1 \cap \sigma_2 = \emptyset, one can find a function f \in A with f(x) \leq h(x) if x \in \sigma_1 but f(x) \geq h(x) + t if x \in \sigma_2. The combinatorial dimension v(A,t) of A is the maximal cardinality of a set t-shattered by A.

The Theorem: On 26th January 2004, Mark Rudelson and Roman Vershynin submitted to the Annals of Mathematics a paper in which they proved that if there exists b>1 such that v(A,bt) \leq \frac{1}{2}v(A,t), \, \forall t>0, then inequalities c \cdot v(A, 2t) \leq D(A, t) \leq C \cdot v(A, ct) hold for all t > 0, where c > 0 is an absolute constant, and C depends only on b.

Short context: The condition v(A,bt) \leq \frac{1}{2}v(A,t), \, \forall t>0 in the Theorem is known as the minimal regularity condition, and it is known that the conclusion of the Theorem does not hold without it. The Theorem shows that, under this condition, two ways of measuring “how large the set of functions is” are equivalent, up to the constant factors.

Links: Free arxiv version of the original paper is here, journal version is here. See also Section 6.6 of this book for an accessible description of the Theorem.

Go to the list of all theorems