Harmonic series variant solution

Last year I asked if the series $\sum_{n=1}^\infty\cos n/n$ converges or diverges. Although the series looks rather simple, the standard convergence tests learned in Calculus 1 do not apply directly. However, there is a trick which allows one to rewrite the series into a form where the standard tests can be used.

The trick is to use summation by parts, the discrete analog of integration by parts. Although less well-known, summation by parts does have its uses; for example, it was the method of proving Abel’s summation formula which has come up before.

To simplify the exposition, it is convenient to define a function for the partial sums of the series’ numerators:

\[ C(m) := \sum_{n=1}^m \cos n \]

Now, using summation by parts the series may be rewritten so that

\[ \sum_{n=1}^m\frac{\cos n}{n} = \frac{C(m)}{m} + \sum_{n=1}^{m-1} C(n)\Bigl(\frac{1}{n}-\frac{1}{n+1}\Bigr) . \tag{1} \]

The key observation to make now is that $C(m)$ is bounded. This can be seen by using the complex exponential expression of the cosine, and summing a geometric series to find a closed-form expression for $C(m)$, as follows:

\begin{align*}
C(m) &= \sum_{n=1}^m\frac{e^{in}+e^{-in}}{2} \\
&= \frac{e^i}{2}\Bigl(\frac{e^{im}-1}{e^i-1}\Bigr)+\frac{e^{-i}}{2}\Bigl(\frac{e^{-im}-1}{e^{-i}-1}\Bigr) \\
&= \frac{(e^{im}-1)(1-e^i)+(e^{-im}-1)(1-e^{-i})}{2(e^i-1)(e^{-i}-1)} \\
&= \frac{(e^{im}+e^{-im})-(e^{i(m+1)}+e^{-i(m+1)})}{2(2-(e^i+e^{-i}))}-\frac{1}{2} \\
&= \frac{\cos m-\cos(m+1)}{2(1-\cos1)}-\frac{1}{2}
\end{align*}

By the triangle inequality, this has an absolute upper bound of

\[ \lvert C(m)\rvert \leq \frac{1}{1-\cos1}+\frac{1}{2}<3 . \]

From this it follows that $C(m)/m\to0$ as $m\to\infty$, and after taking the limit (1) becomes

\[ \sum_{n=1}^\infty\frac{\cos n}{n} = \sum_{n=1}^\infty{}C(n)\Bigl(\frac{1}{n}-\frac{1}{n+1}\Bigr) , \tag{2} \]

and we can use the absolute comparison test on the right sum. Using the fact that

\[ \frac{1}{n}-\frac{1}{n+1} = \frac{1}{n^2+n} \leq \frac{1}{n^2} \]

we have $\bigl\lvert C(n)(\frac{1}{n}-\frac{1}{n+1})\bigr\rvert<3/n^2$, and $\sum_{n=1}^\infty 3/n^2=3\zeta(2)$ is absolutely convergent, so by the comparison test the right sum in (2) is absolutely convergent as well. Thus the left side of (2) must converge, so $\sum_{n=1}^\infty\cos n/n$ converges.

Minkowski’s Theorem

Minkowski’s theorem says that if $S$ is a convex set in $\newcommand{\R}{\mathbb{R}}\R^n$ which is symmetric about the origin and has volume $V(S)>2^n$ (or if $S$ is compact and $V(S)\geq2^n$) then $S$ contains some nonzero point of $\newcommand{\x}{\mathbf{x}}\newcommand{\Z}{\mathbb{Z}}\Z^n$.

Minkowski’s theorem can be seen as a simple consequence of Blichfeldt’s theorem. In particular, consider applying its statement to the set $S/2:=\{\,s/2:s\in S\,\}$. Since \[V(S/2)=V(S)/2^n>1\] (or if $S$ is compact, $V(S/2)\geq1$) the theorem says that there exist distinct $\x_1$, $\x_2\in S/2$ with $\x_1-\x_2\in\Z^n$. Say that these have the form $\newcommand{\s}{\mathbf{s}}\x_1=\s_1/2$ and $\x_2=\s_2/2$ where $\s_1$, $\s_2\in S$.

Since $S$ is symmetric about the origin, it follows that $-\s_2\in S$. Additionally, since $S$ is convex, it follows that the midpoint of $\s_1$ and $-\s_2$ is also in $S$. But this midpoint $(\s_1-\s_2)/2=\x_1-\x_2$ is also in $\Z^n$. Since $\x_1\neq\x_2$ this point is nonzero, as required.

Using the form of Blichfeldt’s theorem applied to general lattices $L$ of dimension $n$, one finds that if $S$ is a convex and symmetric set in $\R^n$ with volume $V(S)>2^n\det(L)$ (or if $S$ is compact and $V(S)\geq2^n\det(L)$) then $S$ contains a nonzero point of $L$.

Blichfeldt’s Theorem

The following theorem was proven by Blichfeldt1 in 1914:

Let $P$ be a set of points in $\newcommand{\R}{\mathbb{R}}\R^n$ which are invariant under translation by vectors in $\newcommand{\Z}{\mathbb{Z}}\Z^n$, and say that $[0,1)^n$ contains $N$ points of $P$. If $S$ is a set in $\R^n$ with volume $V(S)$ then there is some translation $\newcommand{\x}{\mathbf{x}}S+\x$ which contains at least $N\cdot V(S)$ points of $P$. Additionally, if $S$ is compact then there is some translation of $S$ which contains more than $N\cdot V(S)$ points of $P$.

To show this, let $\newcommand{\p}{\mathbf{p}}\newcommand{\c}{{,}~}\p_1\c\dotsc\c\p_N$ be the points of $P$ contained in $[0,1)^n$. Since $P$ is invariant under translation by $\Z^n$, every $\p\in P$ is equivalent to some $\p_i$ modulo $\Z^n$. Let $\nu_i(S)$ denote the number of points in $P\cap S$ congruent to $\p_i$, so

\[ \newcommand{\y}{\mathbf{y}}\nu_i(S) = \sum_{\y\in\Z^n}\chi_S(\p_i+\y) \]

where $\chi_S$ is the characteristic function of $S$. Then

\begin{align*}\newcommand{\d}{\,\mathrm{d}}
\int_{[0,1)^n}\nu_i(S+\x)\d\x &= \int_{[0,1)^n}\sum_{\y\in\Z^n}\chi_{S+\x}(\p_i+\y)\d\x \\
&= \int_{[0,1)^n}\sum_{\y\in\Z^n}\chi_S(\p_i+\y-\x)\d\x \\
&= \int_{(-1,0]^n}\sum_{\y\in\Z^n}\chi_S(\p_i+\y+\x)\d\x \\
&= \sum_{\y\in\Z^n}\int_{(-1,0]^n+\p_i+\y}\chi_S(\x)\d\x \\
&= \int_{\R^n}\chi_S(\x)\d\x \\
&= V(S)
\end{align*}

where the penultimate step follows since taking the set $(-1,0]^n+\p_i+\y$ for each $\y\in\Z^n$ will produce a tiling of all of $\R^n$ with no overlap.

Letting $\nu(S):=\lvert P\cap S\rvert=\sum_{i=1}^N\nu_i(S)$ and summing the above over $1\leq i\leq N$, we find that

\[ \int_{[0,1)^n}\nu(S+\x)\d\x = N\cdot V(S) . \]

By an averaging argument, there must be some $\x\in[0,1)^n$ such that $\nu(S+\x)/V([0,1)^n)\geq N\cdot V(S)$, i.e., the translation $S+\x$ contains at least $N\cdot V(S)$ points of $P$, as required.

Next, we consider the case where $S$ is compact (i.e., closed and bounded). Since $\nu(S+\x)$ is an integer, if $N\cdot V(S)$ is not an integer then it immediately follows that $\nu(S+\x)>N\cdot V(S)$. So suppose that $h:=N\cdot V(S)$ is an integer. Let $S_k:=(1+\frac{1}{k})S$, so that applying the above result on $S_k$ for each $k\geq1$ we have that there exists some $\x_k\in[0,1)^n$ such that $\nu(S_k+\x_k)\geq N\cdot V(S_k)>h$.

Since the $\x_k$ lie in $[0,1]^n$ which is compact, there must be some subsequence $\x_{k_j}$ which converges to a point $\x\in[0,1]^n$. By construction, each $S_{k_j}+\x_{k_j}$ contains at least $h+1$ points of $P$. Since $\bigcup_j S_{k_j}+\x_{k_j}$ is bounded, it contains a finite number of points of $P$. In particular, there are a finite number of ways of choosing $h+1$ points of $P$ from $\bigcup_j S_{k_j}+\x_{k_j}$. By the pigeonhole principle, there must be some choice $\p_1\c\dotsc\c\p_{h+1}\in P$ which appear infinitely often in the sets $S_{k_j}+\x_{k_j}$. Let $k’_j$ index a subsequence of these sets so that $\p_i\in S_{k’_j}+\x_{k’_j}$ for all $i$ and $j$.

The points $\p_i$ must in fact appear in $S+\x$; otherwise there would be some $\p_i$ with positive distance to $S+\x$, but the distance of the farthest point of $S_{k’_j}+\x_{k’_j}$ to $S+\x$ goes to $0$ as $j\to\infty$, contradicting the supposed positive distance that $\p_i\in S_{k’_j}+\x_{k’_j}$ is away from $S+\x$. As required, we have found some translation $S+\x$ which contains more than $h=N\cdot V(S)$ points of $P$.

Alternate statement

Sometimes Blichfeldt’s theorem is stated in the following form:

Let $S$ be a set in $\R^n$ with volume $V(S)>m$ for some $m\in\Z$ (if $S$ is compact we only require $V(S)\geq m$). Then $S$ contains $\x_1\c\dotsc\c\x_{m+1}$ distinct points whose differences $\x_i-\x_j$ all lie in $\Z^n$.

This is a simple corollary of what we’ve already shown. Take $P:=\Z^n$, so that $N=1$. The theorem tells us there is some $\x$ such that $S+\x$ contains at least $V(S)\geq m+1$ (or if $S$ is compact, more than $V(S)\geq m$) points of $P$. Thus we have there exist $\p_i\in P$ and $\x_i\in S$ with $\p_i=\x_i+\x$ for $1\leq i\leq m+1$. Then

\[ \x_i-\x_j = \p_i-\p_j \in \Z^n , \]

as required.

Generalization

Finally, Blichfeldt’s theorem can be generalized to apply to arbitrary lattice points, instead of just $\Z^n$. One possible way of stating it would be:

Let $L$ be a lattice in $\R^n$ with volume $\det(L)$, and $S$ be a set in $\R^n$ with volume $V(S)>m\det(L)$ for some $m\in\Z$ (if $S$ is compact we only require $V(S)\geq m\det(L)$). Then $S$ contains $\x_1\c\dotsc\c\x_{m+1}$ distinct points whose differences $\x_i-\x_j$ all lie in $L$.

The same argumentation as above suffices to show this, except that we replace $[0,1)^n$ (a fundamental region for $\Z^n$) in the argument with a fundamental region for $L$.

Two characterizations of a lattice

A lattice is a discrete additive subgroup of $\newcommand{\R}{\mathbb{R}}\R^n$; for the purposes of this post we’ll restrict ourselves to full rank lattices, i.e., those which span the entire vector space $\R^n$. An alternate way of defining a lattice is to consider it as the integer span of a collection of linearly independent vectors $\newcommand{\b}{\mathbf{b}}\newcommand{\c}{{,}~}\b_1\c\dotsc\c\b_n\in\R^n$, known as a basis of $L$. That is, $L$ consists of the collection of points

\[ \newcommand{\norm}[1]{\lVert#1\rVert}\newcommand{\x}{\mathbf{x}}\newcommand{\Z}{\mathbb{Z}} \biggl\{\,\sum_{i=1}^n x_i\b_i : x_i\in\Z \,\biggr\} . \]

The goal of this post is to show that these two ways of defining a lattice are equivalent. It is more or less immediate that a lattice $L$ in the second sense is an additive subgroup which spans $\R^n$; it is also discrete since one can show that1

\[ \newcommand{\0}{\mathbf{0}} \norm{\x} \geq \min_{1\leq i\leq n}\norm{\b_i^*} \]

for all $\x\in L$ and $\x\neq\0$, where $\b_i^*$ is the Gram–Schmidt orthogonalization of $\b_i$. Thus $\0$ is an isolated point of $L$, and it follows that every point of $L$ must be isolated; a nonisolated point would imply (using a suitable translation) that $\0$ is nonisolated.

The harder direction is to show that a lattice in the first sense is also a lattice in the second sense; the remainder of the post is devoted to this. Let $L$ be a discrete additive subgroup of $\R^n$ containing $n$ linearly independent vectors.

In the case $n:=1$, since $L$ spans $\R$ it must contain $\pm a\neq0$. Since it also contains $0$ and is discrete, there must be a minimal $b>0$ with $b\in L$. Then as required we have $L= \{\, xb : x\in\Z \,\}$; if there was any point $c=xb\in L$ for $x\notin\Z$ then $c-\lfloor x\rfloor b$ would lie in $L$ and $(0,b)$, a contradiction to the minimality of $b$.

Now suppose the result holds for all lattices in $\R^{n-1}$; we will show the result holds for $L$ and appeal to induction. We know that $L$ contains $n$ linearly independent vectors; select any $n-1$ of them and let $S$ be the subspace they generate. Let $S’$ be the rotation of $S$ such that every vector in $S’$ has a $0$ in its final coordinate. Crucially, applying the rotation on $L$ to form $L’$ preserves the discrete additive subgroup structure, so we can apply the induction hypothesis to the discrete additive subgroup formed by only considering the first $n-1$ coordinates of $S’\cap L’$. Let $\b_1\c\dotsc\c\b_{n-1}$ be a basis of this lattice (which exists by hypothesis), except extended with an extra $0$ coordinate so as to live in $\R^n$ instead of $\R^{n-1}$.

Since $L’$ generates $\R^n$, it must contain a vector not in $S’$, i.e., with nonzero final coordinate. Furthermore, it must contain a vector with minimal positive final coordinate; otherwise there would exist a sequence $\newcommand{\N}{\mathbb{N}}\{\x_i\in L’\}_{i\in\N}$ with $(x_i)_n\to0$ as $i\to\infty$. By translating the $\x_i$ by suitable multiples of $\b_1\c\dotsc\c\b_{n-1}$ we can ensure that they lie in the compact set

\[ \biggl\{\, \sum_{i=1}^{n-1} \alpha_i \b_i + (0,\dotsc,0,\alpha) : \alpha_i\in[0,1]\c\lvert \alpha\rvert\leq\max_{i\in\N}\lvert(x_i)_n\rvert \,\biggr\} , \]

and therefore some subsequence of the $\x_i$ converges to some point in the set. Taking successive differences of this subsequence we get a sequence of points in $L’$ which converge to $\0\in L’$, a contradiction to the discreteness of $L’$.

Let $\b_n\in L’$ be a vector with minimal nonzero final coordinate. We claim that $\b_1\c\dotsc\c\b_n$ is a basis for $L’$. Let $\x\in L’$ be arbitrary, and consider

\[ \x’ := \x – \biggl\lfloor \frac{x_n}{(b_n)_n} \biggr\rfloor \b_n \in L’ . \]

Its final coordinate is $x_n-\lfloor x_n/(b_n)_n\rfloor(b_n)_n\in[0,(b_n)_n)$ and therefore by minimality of $(b_n)_n$ it must be $0$, so $\x’\in S’\cap L’$ and therefore can be written as an integer combination of $\b_1\c\dotsc\c\b_{n-1}$. Thus $\x=\x’+\lfloor x_n/(b_n)_n\rfloor\b_n$ can be written as an integer linear combination of $\b_1\c\dotsc\c\b_n$, and so these vectors form a basis of $L’$. Applying the reverse rotation of $L\mapsto L’$ to the basis $\b_1\c\dotsc\c\b_n$ gives us a basis of $L$, as required.

  1. To see this, write $\x = \sum_{i=1}^n x_i \b_i = \sum_{i=1}^n x_i^* \b_i^*$. Let $k$ be the largest $i$ such that $x_i\neq0$, so $\DeclareMathOperator{\sp}{span} \x \in x_k \b_k^* + \sp(\b_1,\dotsc,\b_{k-1})$ which shows that $x_k^*=x_k\in\Z$. Then

    \begin{equation}\norm{\x}^2 = \sum_{i=1}^k (x_i^*)^2 \norm{\b_i^*}^2 \geq (x_k^*)^2 \norm{\b_k^*}^2 \geq \min_{1\leq i\leq n} \norm{\b_i^*}^2 , \end{equation}

    as required.

A curious hypersphere property

Last time when I derived the formula for the volume of a hypersphere in $n$ dimensions I forgot to point out a curious consequence of the formula, namely that the volume tends to zero as $n$ tends to infinity.

When I was an undergraduate I remember a professor of mine pointing this out and then declaring “That doesn’t make sense!”. At the time it didn’t seem too surprising to me, since I could see that the unit circle in $\newcommand{\R}{\mathbb{R}}\R^2$ took up more of the surrounding square $[-1,1]^2$ than the unit sphere in $\R^3$ took up of $[-1,1]^3$. Consquently, I thought it likely that the ratio of the volume of the unit sphere in $\R^n$ to the volume of $[-1,1]^n$ should go to zero as $n\to\infty$.

However, I misunderstood the claim being made: not only does the above ratio of hypersphere-to-hypercube volume go to zero, the volume of the hypersphere itself goes to zero. This was something I hadn’t even considered: since as $n\to\infty$ the hypersphere is “growing”, I presumably took for granted that its volume should go to infinity, not zero!

Of course, one can consider the unit sphere in $\R^{n-1}$ as a subset of the unit sphere in $\R^n$, since for example the unit sphere in $\R^3$ contains the unit circle as a “slice”. In this way as $n\to\infty$ the hypersphere is growing. However, though the “slice” has volume in $\R^{n-1}$, it has no volume in $\R^n$; as the dimension increases it becomes “harder” to make volume in a sense. This allows the hypersphere to “grow” as $n\to\infty$ while still shrink in volume.

Algebraically, as we’ve seen, the volume of the unit sphere in $\R^n$ is given by

\[ V_n = \frac{\pi^{n/2}}{(n/2)!} . \]

If one knows Stirling’s approximation

\[ n! \sim \sqrt{2\pi n}\Bigl(\frac{n}{e}\Bigr)^n \]

then it isn’t too hard to see that the denominator of $V_n$ grows asymptotically faster than the numerator, and therefore $V_n$ tends to $0$. Explicitly, we have

\[ \lim_{n\to\infty} V_n = \lim_{n\to\infty}\frac{\pi^{n/2}}{\sqrt{\pi n}(\frac{n}{2e})^{n/2}} = \lim_{n\to\infty}\frac{1}{\sqrt{\pi n}}\cdot\lim_{n\to\infty}\Bigl(\frac{2\pi e}{n}\Bigr)^{n/2} = 0 \]

since $\lim_{n\to\infty}1/\sqrt{\pi n}=0$ and

\[ \lim_{n\to\infty}\Bigl(\frac{2\pi e}{n}\Bigr)^{n/2} = \lim_{n\to\infty}\exp\Bigl(\frac{n}{2}\ln\Bigl(\frac{2\pi e}{n}\Bigr)\Bigr) = \lim_{m\to-\infty}\exp(m) = 0 . \]

Volume of a hypersphere

The volume of a hypersphere with radius $R$ in $n$ dimensions is given by the expression1

\[ V_n(R) = \frac{\pi^{n/2}}{(n/2)!} R^n . \]

We will show this by induction on $n$. The base cases can be checked directly, where we make use of polar coordinates in two dimensions:

\begin{gather*}
V_1(R) = \int_{-R}^R\newcommand{\d}{\,\mathrm{d}}\d x = 2R = \frac{\pi^{1/2}}{(1/2)!} R \\
V_2(R) = \iint\limits_{x_1^2+x_2^2\leq R^2}\d x_2\d x_1 = \int_0^{2\pi}\int_0^R r\d r\d\theta = 2\pi\biggl[\frac{r^2}{2}\biggr]_0^R = \pi R^2
\end{gather*}

Suppose the formula holds in dimension $n-2$. Using this, we will show that the formula holds in dimension $n$:

\begin{align*}
V_n(R) &= \int\limits_{x_1^2\leq R^2}\;\int\limits_{x_1^2+x_2^2\leq R^2}\;\int\limits_{x_1^2+x_2^2+x_3^2\leq R^2}\dotsi\int\limits_{x_1^2+\dotsb+x_n^2\leq R^2}\d x_n\dotsm\d x_1 \\
&= \int\limits_{x_1^2\leq R^2}\;\int\limits_{x_1^2+x_2^2\leq R^2}\;\int\limits_{x_3^2\leq R^2-x_1^2-x_2^2}\dotsi\int\limits_{x_3^2+\dotsb+x_n^2\leq R^2-x_1^2-x_2^2}\d x_n\dotsm\d x_1 \\
&= \int\limits_{x_1^2\leq R^2}\;\int\limits_{x_1^2+x_2^2\leq R^2}V_{n-2}\Bigl(\sqrt{R^2-x_1^2-x_2^2}\Bigr)\d x_2\d x_1 \\
&= \frac{\pi^{n/2-1}}{(n/2-1)!}\iint\limits_{x_1^2+x_2^2\leq R^2}\sqrt{R^2-x_1^2-x_2^2}^{n-2}\d x_2\d x_1 \\
&= \frac{\pi^{n/2-1}}{(n/2-1)!}\int_0^{2\pi}\int_0^R\sqrt{R^2-r^2}^{n-2}r\d r\d\theta \\
&= \frac{2\pi^{n/2}}{(n/2-1)!}\biggl[-\frac{1}{n}\sqrt{R^2-r^2}^n\biggr]_0^R \\
&= \frac{\pi^{n/2}}{(n/2)!} R^n
\end{align*}

By induction, the formula holds for all positive integers $n$.

  1. As one might expect, the factorial with a noninteger argument is simply notation for the gamma function, i.e., $n!:=\Gamma(n+1)$.

Double logarithm summation over primes

It is well known1 that

\[ \sum_{p\leq x}\ln p = x + O\biggl(\frac{x}{e^{c\sqrt{\ln x}}}\biggr) \]

where $c>0$ is a constant and the summation runs over the primes. In fact, under the Riemann hypothesis, one even has

\[ \sum_{p\leq x}\ln p = x + O(x^{1/2+\epsilon}) \]

for any $\epsilon>0$. Since $e^{c\sqrt{\ln x}}$ grows slower than any power of $x$, the second statement gives a better approximation.

A related question, but one I wasn’t familiar with, is to give a similar asymptotic result for the summation with $\ln p$ replaced by $\ln\ln p$. In other words, to estimate the quantity

\[ \sum_{p\leq x}\ln\ln p . \]

To do this, we may employ Abel’s summation formula with

\[ a_n := \begin{cases}
1 & \text{if $n$ is prime} \\
0 & \text{otherwise}
\end{cases} \]

and $\phi(n):=\ln\ln n$. Then we have

\[ \sum_{p\leq x}\ln\ln p = \pi(x)\ln\ln x-\int_2^x\frac{\pi(t)}{t\ln t}\mathrm{d}t . \]

By the prime number theorem we have $\pi(t)=t/\ln t+O(t/\ln(t)^2)$, so

\[ \int_2^x\frac{\pi(t)}{t\ln t}\mathrm{d}t = \int_2^x\frac{\mathrm{d}t}{\ln(t)^2}+O\biggl(\int_2^x\frac{\mathrm{d}t}{\ln(t)^3}\biggr) . \]

By Wolfram Alpha we have that

\[ \int_2^x\frac{\mathrm{d}t}{\ln(t)^2} = \DeclareMathOperator{\li}{li}\li(x)-\frac{x}{\ln x} + O(1) = \frac{x}{\ln(x)^2} + O\biggl(\frac{x}{\ln(x)^3}\biggr) , \]

with the latter equality following from the asymptotic expansion of the logarithmic integral.

It remains to estimate the integral $\int_2^x\ln(t)^{-3}\,\mathrm{d}t$. Actually, this is not entirely straightforward, but a trick is to split the integral into two (around $\sqrt{x}$) and then estimate each, as follows:

\[ \int_2^{\sqrt{x}}\frac{\mathrm{d}t}{\ln(t)^3} + \int_{\sqrt{x}}^x\frac{\mathrm{d}t}{\ln(t)^3} \leq \frac{\sqrt{x}-2}{\ln(2)^3} + \frac{x-\sqrt{x}}{\ln(\sqrt{x})^3} = O\biggl(\frac{x}{\ln(x)^3}\biggr) \]

Putting everything together, we find the result

\[ \sum_{p\leq x}\ln\ln p =\pi(x)\ln\ln x-\frac{x}{\ln(x)^2} +O\biggl(\frac{x}{\ln(x)^3}\biggr) . \]

  1. For example, see (2.29) in Approximate formulas for some functions of prime numbers by Rosser and Schoenfeld.

A variant $n+1$ primality test

Last time I discussed the $n-1$ and $n+1$ primality tests. Recall that the $n-1$ test says that $n$ is prime if there exists an $\newcommand{\Z}{\mathbb{Z}}a\in\Z^*_n$ such that

\begin{align}
a^{n-1} &\equiv 1 \pmod{n} \\
a^{(n-1)/p} &\not\equiv 1 \pmod{n}
\end{align}

for all primes $p$ which divide $n-1$.

The $n+1$ can be stated in a similar form, and says that $n$ is prime if it is odd and there exists an $a\in(\Z[\sqrt{d}])^*$ with $(\frac{d}{n})=-1$ such that

\begin{align}
a^{n+1} &\equiv 1 \pmod{n} \\
a^{(n+1)/p} &\not\equiv 1 \pmod{n}
\end{align}

for all primes $p$ which divide $n+1$.

I state it in this form to make the connection with the $n-1$ test, but I’ve done a little sleight-of-hand in the presentation. In the first test $a$ is a unit of $\Z_n$, while in the second test $a$ is a unit of $\Z[\sqrt{d}]$ (not $\Z_n[\sqrt{d}]$). That is, the norm of $a$ in $\mathbb{Q}(\sqrt{d})$ is $1$; this is a rather restrictive condition. In fact, when $d<-3$ the only units of $\Z[\sqrt{d}]$ are $\pm1$, and both of these will fail the second condition since $(n+1)/p$ will be even for some $p$.

When $d$ is positive and squarefree the situation is a little better in that there are an infinite number of units in $\Z[\sqrt{d}]$. However, these units are all of the form $\pm\epsilon^k$ for some fundamental unit $\epsilon:=x+y\sqrt{d}$ (this may be found by solving Pell’s equation $x^2-dy^2=1$). If the fundamental unit doesn’t satisfy the conditions then any power of it will also necessarily fail, so for any given value of $d$ there is essentially only one possible choice of $a$ which could work. On the upside, one could simply look up this choice in a table when $d$ is small; e.g., for $d:=3$ one should use $a:=2+\sqrt{3}$.

So that’s an unfortunate condition which isn’t present in the $n-1$ test, but it’s necessary to be able to use Fermat’s theorem in $\Z[\sqrt{d}]$, which implies that if $p$ is prime and $a$ has norm $1$ then

\[ a^{p-(d/p)} \equiv 1 \pmod{p} , \]

and more generally

\[ a^{p^{e-1}(p-(d/p))} \equiv 1 \pmod{p^e} . \]

Now we’re ready to prove that the primality test works as stated. Let $\newcommand{\ord}{\mathop{\mathrm{ord}}\nolimits}\ord_{n,d}(a)$ denote the order of $a$ in $(\Z_n[\sqrt{d}])^*$, so the two conditions of the primality test tell us that $\ord_{n,d}(a)=n+1$.

Say $n$ has prime factorization $\prod_{i=1}^k p_i^{e_i}$. By the Chinese remainder theorem, we have

\[ \Z_n[\sqrt{d}] \cong \prod_{i=1}^k \Z_{p_i^{e_i}}[\sqrt{d}] , \]

so

\[ n+1 = \ord_{n,d}(a) = \DeclareMathOperator{\lcm}{lcm}\lcm(\ord_{p_1^{e_1},d}(a),\dotsc,\ord_{p_k^{e_k},d}(a)) \]

and by Fermat’s theorem this divides

\[ \lcm(p_1^{e_1-1}(p_1-(d/p_1)),\dotsc,p_k^{e_k-1}(p_k-(d/p_k))) . \]

Since each $p_i$ is odd, this equals

\begin{align}
&\mathrel{\phantom{=}}2\lcm\Bigl(p_1^{e_1-1}\frac{p_1-(d/p_1)}{2},\dotsc,p_k^{e_k-1}\frac{p_k-(d/p_k)}{2}\Bigr) \\
&\leq 2\prod_{i=1}^k p_i^{e_i-1}\frac{p_i-(d/p_i)}{2} \\
&\leq 2n\prod_{i=1}^k\frac{p_i+1}{2p_i} .
\end{align}

Now, if $n$ has at least two distinct prime factors then this is at most

\[ 2n\cdot\frac{3+1}{2\cdot3}\cdot\frac{5+1}{2\cdot5} = \frac{4}{5}n . \]

Thus we conclude that $n+1\leq4n/5$, a contradiction since $n$ is positive. Thus $n$ must have just one prime factor; say $n:=p^e$. Using Fermat’s theorem and the fact that $(\frac{d}{p})=-1$ (otherwise $(\frac{d}{n})\neq-1$), we have

\[ n+1 = \ord_{n,d}(a) \mid p^{e-1}(p+1) = n+p^{e-1} . \]

It follows that $n+1\mid p^{e-1}-1\leq n/3$, again a contradiction, unless $p^{e-1}=1$, i.e., $e=1$ and $n=p$ is prime.

I found this kind of argument in Prime Numbers and Computer Methods for Factorization (page 116). In that book it is applied to Lucas sequences, which simplifies some things, although can also obscure the group $\mathbb{F}_{n^2}^*$ working in the background.

The $n-1$ and $n+1$ primality tests

Determining if a number $n$ is a prime number or not is an important problem in computational number theory. Two simple ways of proving primality rely on the prime factorizations of $n-1$ and $n+1$. In general finding these factorizations is probably a harder problem than testing the primality of $n$, so the methods are only applicable in special cases, but are they are interesting nonetheless.

At a high level, the $n-1$ method works by showing that a subgroup of $\newcommand{\Z}{\mathbb{Z}}\Z^*_n$ is so large that $n$ must be prime. Specifically, a subgroup of order $n-1$ is demonstrated, which implies that $\Z_n^*$ is as large as possible, namely the full set of nonzero residues $\{1,2,\dotsc,n-1\}$; if even one element was missing then there would be less than $n-1$ elements in $\Z_n^*$. But $\Z_n^*=\{1,2,\dotsc,n-1\}$ means that every positive integer strictly less than $n$ is coprime to $n$, and so $n$ is prime.

To show that $\Z_n^*$ contains a subgroup of size $n-1$, we find an element $a\in\Z_n^*$ whose order is $n-1$ (such an element is called a primitive root). To do this, we show that

\[ a^{n-1} \equiv 1 \pmod{n} \tag{1} \]

and

\[ a^{(n-1)/p} \not\equiv 1 \pmod{n} \tag{2} \]

for all primes $p$ which divide $n-1$. This is enough to show that the order of $a$ is $n-1$; if the true order was $r<n-1$ then using Bézout’s identity one would be able to derive $a^{\gcd(r,n-1)}\equiv1\pmod{n}$ and this would contradict (2) for some $p$.

When $n$ is prime, there isn’t any known efficient algorithm which is guaranteed to find a primitive root $a$ satisfying (1) and (2), but in practice this isn’t a concern. In fact, there are $\varphi(n-1)=\Theta(n/\ln\ln n)$ primitive roots (where $\varphi$ denotes Euler’s totient function), so if one simply tests if random $a$ satisfies (1) and (2) then one should quickly find one which works, and thereby prove that $n$ is prime. As mentioned, the real problem with applying this method in practice is finding the primes $p$ which divide $n-1$.

Incidentally, when $n$ isn’t prime, this is usually easy to show since condition (1) will often fail to hold, and all primes satisfy (1) for all $a\in\Z_n^*$. However, some composite numbers still satisfy (1) for all $a\in\Z_n^*$. In such a case a more stringent form of (1) can be used to prove compositeness, and this method can be employed in practice since it only requires the “evenness factorization” $n-1=2^r\cdot m$ with $m$ odd, which is simple to compute.

The $n+1$ method is similar to the $n-1$ method, except it works in the group $(\Z_n[\sqrt{d}])^*$, where $d$ satisfies $(\frac{d}{n})=-1$, i.e., $d$ is not a quadratic residue mod $n$. We assume that $n$ is odd (otherwise the problem is trivial), so that in practice $d$ can be found by computing the Jacobi symbol $(\frac{d}{n})$ for multiple values of $d$ until one works. Note that when $n$ is prime we have $\newcommand{\F}{\mathbb{F}}\Z_n[\sqrt{d}]\cong\F_{n^2}$, the finite field of size $n^2$, so there still exist primitive roots $a$ which generate $(\Z_n[\sqrt{d}])^*$. We’ll denote the conjugate of $a$ by $\bar{a}$, so if $a:=b+c\sqrt{d}$ with $b$, $c\in\Z_n$ then $\bar{a}=b-c\sqrt{d}$.

As you might expect in relation to the above, the $n+1$ method finds a subgroup of $(\Z_n[\sqrt{d}])^*$ of size $n+1$ by finding an $a\in(\Z_n[\sqrt{d}])^*$ which has order $n+1$. Actually, we need something a bit stronger than this; we need to show

\[ a^{n+1} \equiv 1 \pmod{n} \tag{3} \]

and

\[ \gcd((a^{(n+1)/p}-1)(\bar{a}^{(n+1)/p}-1),n)=1 \tag{4} \]

for all primes $p$ which divide $n+1$. Condition (4) not only implies that $a^{(n+1)/p} \not\equiv 1 \pmod{n}$, but also that $a^{(n+1)/p} \not\equiv 1 \pmod{q}$ where $q$ is any prime divisor of $n$. To see that this, we show the contrapositive. Suppose that $q$ is a prime divisor of $n$ and that $q\mid a^{(n+1)/p}-1$.1 It follows that $q$ also divides $(a^{(n+1)/p}-1)(\bar{a}^{(n+1)/p}-1)\in\Z$, and thus divides the gcd in (4), so (4) fails to hold.

Since condition (3) implies that $a^{n+1}\equiv1\pmod{q}$ and condition (4) implies that $a^{(n+1)/p}\not\equiv1\pmod{q}$, just like before one knows that $a\pmod{q}$ has order $n+1$. In particular, there must be at least $n+1$ elements in $(\Z_q[\sqrt{d}])^*$.

Toward a contradiction, suppose that (3) and (4) hold and that $n$ is not prime. Let $q$ be the smallest prime divisor of $n$, so that $q^2\leq n$. As we just saw, $a\pmod{q}$ has order $n+1$, so we get the lower bound

\[ n+1 \leq \lvert(\Z_q[\sqrt{d}])^*\rvert . \]

However, $\Z_q[\sqrt{d}]$ has $q^2$ elements, and so

\[ \lvert(\Z_q[\sqrt{d}])^*\rvert \leq q^2-1 \leq n-1 , \]

which contradicts the lower bound. Thus $n$ must in fact be prime.

When $n$ is prime, an $a$ which satisfies (3) and (4) can be found by taking a primitive root of $\F_{n^2}$ and raising it to the power $n-1$, since in this case $(a^{n-1})^{n+1}\equiv1\pmod{n}$ and $n$ will not divide $(a^{n-1})^{(n+1)/p}-1$ or its conjugate. As mentioned, there isn’t a guaranteed procedure to find a primitive root, but since there are $\varphi(n^2-1)=\Theta(n^2/\ln\ln n)$ of them, in practice it shouldn’t be too hard to find one; the sticking point is finding the factorization of $n+1$.

Incidentally, the $n+1$ test is often presented using Lucas sequences rather than $\F_{n^2}$. But that’s a topic for another post.

  1. This notation means that there is some algebraic integer $k$ such that $qk=a^{(n+1)/p}-1$. However, it is actually only necessary to consider $k\in\Z\lbrack\sqrt{d}\rbrack$, since we can take $a\in\Z\lbrack\sqrt{d}\rbrack$ and $q$ is odd.

The conditional prime number theorem

Previously I discussed the prime number theorem and proved Chebyshev’s weak form of it, namely that $\pi(n)=\Theta(n/\ln n)$. Chebyshev was also able to show the conditional result that if

\[ \lim_{n\to\infty}\frac{\pi(n)}{n/\ln n} \]

exists, then its value is $1$. While this might seem to be quite close to the prime number theorem, there is no especially good reason why the limit should exist. In fact, in a sense this result makes the limit less likely to exist; after our previous proof all one knows is that the limit lies between $0.34$ and $4.16$ (supposing it exists), but now it can have only one possible value!

Some new functions

Before proving this result, it is useful to introduce the von Mangoldt function defined by

\[ \Lambda(n) := \begin{cases}
\ln p & \text{if $n\geq2$ is a perfect power of the prime $p$} \\
0 & \text{otherwise}
\end{cases} \]

and the Chebyshev function defined as the partial sums of the von Mangoldt function,

\[ \psi(n) := \sum_{m\leq n}\Lambda(m) . \]

For each prime $p$ there are $\lfloor\log_p n\rfloor$ perfect powers of $p$ less than or equal to $n$ and strictly greater than $1$, so an alternate way of writing this function is

\[ \psi(n) = \sum_{p\leq n}\lfloor\log_p n\rfloor\ln p . \]

Upper bound on $\psi(n)$

By removing the floor and using the logarithm change-of-base formula one gets an upper bound on $\psi(n)$:

\[ \psi(n) \leq \sum_{p\leq n}(\log_p n)\ln p = \sum_{p\leq n}\ln n = \pi(n)\ln n \]

In particular, using the previously established $\pi(n)=O(n/\ln n)$ this implies

\[ \psi(n) = O(n) . \tag{1} \]

Lower bound on $\psi(n)$

By removing every term in the sum defining $\psi(n)$ except when $m$ is a prime larger than $n’$, and using $\pi(n’)=O(n’/\ln n’)$, one gets a similar lower bound on $\psi(n)$:

\begin{align*}
\psi(n) &\geq \sum_{n'<p\leq n}\ln p \\
&\geq \sum_{n'<p\leq n}\ln n’ \\
&= \ln n’\bigl(\pi(n)-\pi(n’)\bigr) \\
&= \pi(n)\ln n’ + O(n’)
\end{align*}

Now, this holds for all $n'<n$, so take $n’:=n/\ln n$ (it is actually not necessary to ensure that $n’$ is an integer). Then the lower bound becomes

\[ \psi(n) \geq \pi(n)(\ln n-\ln\ln n) + O\Bigl(\frac{n}{\ln n}\Bigr) = \pi(n)\ln(n) + O\Bigl(\frac{n\ln\ln n}{\ln n}\Bigr) . \]

Combining the bounds on $\psi(n)$

Since the error term here is $o(n)$, combining with the upper bound $\psi(n)\leq\pi(n)\ln n$ we get the nice relation

\[ \psi(n) = \pi(n)\ln n + o(n) . \]

Dividing by $n$, we find that

\[ \frac{\psi(n)}{n} = \frac{\pi(n)}{n/\ln n} + o(1) . \tag{2} \]

This is a very strong asymptotic relation between $\psi(n)/n$ and $\pi(n)/(n/\ln n)$; it says that the difference between these functions goes to zero as $n$ increases. So as $n$ gets large, the functions essentially become identical.

The first estimation of $\ln(n!)$

The proof now proceeds by estimating the quantity $\ln(n!)$ in two different ways. The first way is a weak form of Stirling’s approximation:

\[ \ln(n!) = n \ln n + O(n) \tag{3} \]

A simple proof of this proceeds by estimating $\newcommand{\d}{\,\mathrm{d}}\int_1^n\ln t\d t$ by rectangles of width $1$:

diagram

Note that $\ln(n!)=\sum_{m=1}^\infty \ln m$, so the area formed by the rectangles is exactly the quantity to estimate in (3).

As shown in the diagram, this quantity is closely under-estimated by $\int_1^n\ln t\d t$ (the dark green area); the amount of under-estimation is given by the light green area. Although this is difficult to evaluate exactly, by sliding the light green triangle-like shapes to the right they all fit inside the final rectangle, which has area $\ln n$. Thus the dark green area under-estimates the area of the rectangles by at most $\ln n$, and we have

\begin{align*}
\ln(n!) &= \int_1^n\ln t\d t + O(\ln n) \\
&= \bigl[t\ln n – t\bigr]_1^n + O(\ln n) \\
&= n\ln n \mathbin- n + O(\ln n)
\end{align*}

which is a stronger form of (3).

The second estimation of $\ln(n!)$

The second method of estimation relies on the prime factorization of $n!$. Recall we had already determined the exponent of $p$ in the prime factorization (also known as the $p$-adic order) of $n!$ to be $\nu_p(n!) := \sum_{k=1}^\infty\lfloor n/p^k\rfloor$. Using this, we have:

\begin{align*}
\ln(n!) &= \ln\Bigl(\prod_{p\leq n}p^{\nu_p(n!)}\Bigr) \\
&= \sum_{p\leq n}\nu_p(n!)\ln p \\
&= \sum_{p^k\leq n}\biggl\lfloor\frac{n}{p^k}\biggr\rfloor\ln p \\
&= \sum_{m\leq n}\Bigl\lfloor\frac{n}{m}\Bigr\rfloor\Lambda(m) \\
&= \sum_{m\leq n}\Bigl(\frac{n}{m} + O(1)\Bigl)\Lambda(m) \\
&= n\sum_{m\leq n}\frac{\Lambda(m)}{m} + O\bigl(\psi(n)\bigr)
\end{align*}

We can evaluate $\sum_{m\leq n}\Lambda(m)/m$ using a trick known as Abel’s summation formula:

\begin{align*}
\sum_{m\leq n}\frac{\Lambda(m)}{m} &= \sum_{m\leq n}\frac{\psi(m)-\psi(m-1)}{m} \\
&= \sum_{m\leq n}\frac{\psi(m)}{m} \mathbin- \sum_{m\leq n-1}\frac{\psi(m)}{m+1} \\
&= \sum_{m\leq n-1}\psi(m)\Bigl(\frac{1}{m}-\frac{1}{m+1}\Bigr) + \frac{\psi(n)}{n} \\
&= \sum_{m\leq n-1}\psi(m)\Bigl[-\frac{1}{t}\Bigr]_m^{m+1} + \frac{\psi(n)}{n} \\
%&= \sum_{m\leq n-1}\psi(m)\int_m^{m+1}\frac{\!\d t}{t^2} + \frac{\psi(n)}{n} \\
&= \sum_{m\leq n-1}\int_m^{m+1}\frac{\psi(t)}{t^2}\!\d t + \frac{\psi(n)}{n} \\
&= \int_1^n\frac{\psi(t)}{t^2}\!\d t + \frac{\psi(n)}{n}
\end{align*}

Most of this is just algebra; the fact that $\psi(m)=\psi(t)$ for $t\in[m,m+1)$ justifies pulling $\psi(m)$ into the integral. Putting these together and using (1), we get our second estimate on $\ln(n!)$:

\[ \ln(n!) = n\int_1^n\frac{\psi(t)}{t^2}\!\d t + O(n) \tag{4} \]

Putting the estimates together

Combining (3) and (4) and dividing by $n$ we find that

\[ \int_1^n\frac{\psi(t)}{t^2}\!\d t = \ln n + O(1) . \tag{5} \]

Using (5) to derive the result

Suppose that the limit in question exists and equals $1+\epsilon$ for some $\epsilon>0$. Then, taking the limit as $n\to\infty$ in (2), we have

\[ \lim_{n\to\infty}\frac{\psi(n)}{n} = \lim_{n\to\infty}\frac{\pi(n)}{n/\ln n} = 1 + \epsilon . \]

It follows that there is some $N$ such that $\psi(t)/t\geq 1+\epsilon/2$ for all $t\geq N$. Now, $N$ only depends on the constant $\epsilon$, so taking $N$ fixed as $n\to\infty$ we have
\[ \int_1^n\frac{\psi(t)}{t^2}\d t \geq \int_N^n\frac{\psi(t)}{t^2}\d t \geq \int_N^n\frac{1+\epsilon/2}{t}\d t = \Bigl(1+\frac{\epsilon}{2}\Bigr)\ln n + O(1) . \]

This contradicts (5), since the coefficient on the leading term $\ln n$ is larger than $1$; for example, this implies that $\int_1^n\psi(t)/t^2\d t-\ln n=\Omega(\ln n)$, which is not $O(1)$ as given by (5).

Similarly, if the limit in question exists and equals $1-\epsilon$ for some $\epsilon>0$ then we have $\lim_{n\to\infty}\psi(n)/n = 1-\epsilon$, so there exists some $N$ such that $\psi(t)/t\leq1-\epsilon/2$ for all $t\geq N$. Then as $n\to\infty$ we have

\[ \int_1^n\frac{\psi(t)}{t^2}\d t \leq O(1) + \int_N^n\frac{1-\epsilon/2}{t}\d t = \Bigl(1-\frac{\epsilon}{2}\Bigr)\ln n + O(1) , \]

which contradicts (5), since this says that the coefficient on the leading term $\ln n$ is less than $1$.

Therefore, we conclude that if $\lim_{n\to\infty}\pi(n)/(n/\ln n)$ exists it cannot be strictly larger than or strictly less than $1$, so it must be exactly $1$.