# A variant $n+1$ primality test

Last time I discussed the $n-1$ and $n+1$ primality tests. Recall that the $n-1$ test says that $n$ is prime if there exists an $\newcommand{\Z}{\mathbb{Z}}a\in\Z^*_n$ such that

\begin{align}
a^{n-1} &\equiv 1 \pmod{n} \\
a^{(n-1)/p} &\not\equiv 1 \pmod{n}
\end{align}

for all primes $p$ which divide $n-1$.

The $n+1$ can be stated in a similar form, and says that $n$ is prime if it is odd and there exists an $a\in(\Z[\sqrt{d}])^*$ with $(\frac{d}{n})=-1$ such that

\begin{align}
a^{n+1} &\equiv 1 \pmod{n} \\
a^{(n+1)/p} &\not\equiv 1 \pmod{n}
\end{align}

for all primes $p$ which divide $n+1$.

I state it in this form to make the connection with the $n-1$ test, but I’ve done a little sleight-of-hand in the presentation. In the first test $a$ is a unit of $\Z_n$, while in the second test $a$ is a unit of $\Z[\sqrt{d}]$ (not $\Z_n[\sqrt{d}]$). That is, the norm of $a$ in $\mathbb{Q}(\sqrt{d})$ is $1$; this is a rather restrictive condition. In fact, when $d<-3$ the only units of $\Z[\sqrt{d}]$ are $\pm1$, and both of these will fail the second condition since $(n+1)/p$ will be even for some $p$.

When $d$ is positive and squarefree the situation is a little better in that there are an infinite number of units in $\Z[\sqrt{d}]$. However, these units are all of the form $\pm\epsilon^k$ for some fundamental unit $\epsilon:=x+y\sqrt{d}$ (this may be found by solving Pell’s equation $x^2-dy^2=1$). If the fundamental unit doesn’t satisfy the conditions then any power of it will also necessarily fail, so for any given value of $d$ there is essentially only one possible choice of $a$ which could work. On the upside, one could simply look up this choice in a table when $d$ is small; e.g., for $d:=3$ one should use $a:=2+\sqrt{3}$.

So that’s an unfortunate condition which isn’t present in the $n-1$ test, but it’s necessary to be able to use Fermat’s theorem in $\Z[\sqrt{d}]$, which implies that if $p$ is prime and $a$ has norm $1$ then

$a^{p-(d/p)} \equiv 1 \pmod{p} ,$

and more generally

$a^{p^{e-1}(p-(d/p))} \equiv 1 \pmod{p^e} .$

Now we’re ready to prove that the primality test works as stated. Let $\newcommand{\ord}{\mathop{\mathrm{ord}}\nolimits}\ord_{n,d}(a)$ denote the order of $a$ in $(\Z_n[\sqrt{d}])^*$, so the two conditions of the primality test tell us that $\ord_{n,d}(a)=n+1$.

Say $n$ has prime factorization $\prod_{i=1}^k p_i^{e_i}$. By the Chinese remainder theorem, we have

$\Z_n[\sqrt{d}] \cong \prod_{i=1}^k \Z_{p_i^{e_i}}[\sqrt{d}] ,$

so

$n+1 = \ord_{n,d}(a) = \DeclareMathOperator{\lcm}{lcm}\lcm(\ord_{p_1^{e_1},d}(a),\dotsc,\ord_{p_k^{e_k},d}(a))$

and by Fermat’s theorem this divides

$\lcm(p_1^{e_1-1}(p_1-(d/p_1)),\dotsc,p_k^{e_k-1}(p_k-(d/p_k))) .$

Since each $p_i$ is odd, this equals

\begin{align}
&\mathrel{\phantom{=}}2\lcm\Bigl(p_1^{e_1-1}\frac{p_1-(d/p_1)}{2},\dotsc,p_k^{e_k-1}\frac{p_k-(d/p_k)}{2}\Bigr) \\
&\leq 2\prod_{i=1}^k p_i^{e_i-1}\frac{p_i-(d/p_i)}{2} \\
&\leq 2n\prod_{i=1}^k\frac{p_i+1}{2p_i} .
\end{align}

Now, if $n$ has at least two distinct prime factors then this is at most

$2n\cdot\frac{3+1}{2\cdot3}\cdot\frac{5+1}{2\cdot5} = \frac{4}{5}n .$

Thus we conclude that $n+1\leq4n/5$, a contradiction since $n$ is positive. Thus $n$ must have just one prime factor; say $n:=p^e$. Using Fermat’s theorem and the fact that $(\frac{d}{p})=-1$ (otherwise $(\frac{d}{n})\neq-1$), we have

$n+1 = \ord_{n,d}(a) \mid p^{e-1}(p+1) = n+p^{e-1} .$

It follows that $n+1\mid p^{e-1}-1\leq n/3$, again a contradiction, unless $p^{e-1}=1$, i.e., $e=1$ and $n=p$ is prime.

I found this kind of argument in Prime Numbers and Computer Methods for Factorization (page 116). In that book it is applied to Lucas sequences, which simplifies some things, although can also obscure the group $\mathbb{F}_{n^2}^*$ working in the background.

# The $n-1$ and $n+1$ primality tests

Determining if a number $n$ is a prime number or not is an important problem in computational number theory. Two simple ways of proving primality rely on the prime factorizations of $n-1$ and $n+1$. In general finding these factorizations is probably a harder problem than testing the primality of $n$, so the methods are only applicable in special cases, but are they are interesting nonetheless.

At a high level, the $n-1$ method works by showing that a subgroup of $\newcommand{\Z}{\mathbb{Z}}\Z^*_n$ is so large that $n$ must be prime. Specifically, a subgroup of order $n-1$ is demonstrated, which implies that $\Z_n^*$ is as large as possible, namely the full set of nonzero residues $\{1,2,\dotsc,n-1\}$; if even one element was missing then there would be less than $n-1$ elements in $\Z_n^*$. But $\Z_n^*=\{1,2,\dotsc,n-1\}$ means that every positive integer strictly less than $n$ is coprime to $n$, and so $n$ is prime.

To show that $\Z_n^*$ contains a subgroup of size $n-1$, we find an element $a\in\Z_n^*$ whose order is $n-1$ (such an element is called a primitive root). To do this, we show that

$a^{n-1} \equiv 1 \pmod{n} \tag{1}$

and

$a^{(n-1)/p} \not\equiv 1 \pmod{n} \tag{2}$

for all primes $p$ which divide $n-1$. This is enough to show that the order of $a$ is $n-1$; if the true order was $r<n-1$ then using Bézout’s identity one would be able to derive $a^{\gcd(r,n-1)}\equiv1\pmod{n}$ and this would contradict (2) for some $p$.

When $n$ is prime, there isn’t any known efficient algorithm which is guaranteed to find a primitive root $a$ satisfying (1) and (2), but in practice this isn’t a concern. In fact, there are $\varphi(n-1)=\Theta(n/\ln\ln n)$ primitive roots (where $\varphi$ denotes Euler’s totient function), so if one simply tests if random $a$ satisfies (1) and (2) then one should quickly find one which works, and thereby prove that $n$ is prime. As mentioned, the real problem with applying this method in practice is finding the primes $p$ which divide $n-1$.

Incidentally, when $n$ isn’t prime, this is usually easy to show since condition (1) will often fail to hold, and all primes satisfy (1) for all $a\in\Z_n^*$. However, some composite numbers still satisfy (1) for all $a\in\Z_n^*$. In such a case a more stringent form of (1) can be used to prove compositeness, and this method can be employed in practice since it only requires the “evenness factorization” $n-1=2^r\cdot m$ with $m$ odd, which is simple to compute.

The $n+1$ method is similar to the $n-1$ method, except it works in the group $(\Z_n[\sqrt{d}])^*$, where $d$ satisfies $(\frac{d}{n})=-1$, i.e., $d$ is not a quadratic residue mod $n$. We assume that $n$ is odd (otherwise the problem is trivial), so that in practice $d$ can be found by computing the Jacobi symbol $(\frac{d}{n})$ for multiple values of $d$ until one works. Note that when $n$ is prime we have $\newcommand{\F}{\mathbb{F}}\Z_n[\sqrt{d}]\cong\F_{n^2}$, the finite field of size $n^2$, so there still exist primitive roots $a$ which generate $(\Z_n[\sqrt{d}])^*$. We’ll denote the conjugate of $a$ by $\bar{a}$, so if $a:=b+c\sqrt{d}$ with $b$, $c\in\Z_n$ then $\bar{a}=b-c\sqrt{d}$.

As you might expect in relation to the above, the $n+1$ method finds a subgroup of $(\Z_n[\sqrt{d}])^*$ of size $n+1$ by finding an $a\in(\Z_n[\sqrt{d}])^*$ which has order $n+1$. Actually, we need something a bit stronger than this; we need to show

$a^{n+1} \equiv 1 \pmod{n} \tag{3}$

and

$\gcd((a^{(n+1)/p}-1)(\bar{a}^{(n+1)/p}-1),n)=1 \tag{4}$

for all primes $p$ which divide $n+1$. Condition (4) not only implies that $a^{(n+1)/p} \not\equiv 1 \pmod{n}$, but also that $a^{(n+1)/p} \not\equiv 1 \pmod{q}$ where $q$ is any prime divisor of $n$. To see that this, we show the contrapositive. Suppose that $q$ is a prime divisor of $n$ and that $q\mid a^{(n+1)/p}-1$.1 It follows that $q$ also divides $(a^{(n+1)/p}-1)(\bar{a}^{(n+1)/p}-1)\in\Z$, and thus divides the gcd in (4), so (4) fails to hold.

Since condition (3) implies that $a^{n+1}\equiv1\pmod{q}$ and condition (4) implies that $a^{(n+1)/p}\not\equiv1\pmod{q}$, just like before one knows that $a\pmod{q}$ has order $n+1$. In particular, there must be at least $n+1$ elements in $(\Z_q[\sqrt{d}])^*$.

Toward a contradiction, suppose that (3) and (4) hold and that $n$ is not prime. Let $q$ be the smallest prime divisor of $n$, so that $q^2\leq n$. As we just saw, $a\pmod{q}$ has order $n+1$, so we get the lower bound

$n+1 \leq \lvert(\Z_q[\sqrt{d}])^*\rvert .$

However, $\Z_q[\sqrt{d}]$ has $q^2$ elements, and so

$\lvert(\Z_q[\sqrt{d}])^*\rvert \leq q^2-1 \leq n-1 ,$

which contradicts the lower bound. Thus $n$ must in fact be prime.

When $n$ is prime, an $a$ which satisfies (3) and (4) can be found by taking a primitive root of $\F_{n^2}$ and raising it to the power $n-1$, since in this case $(a^{n-1})^{n+1}\equiv1\pmod{n}$ and $n$ will not divide $(a^{n-1})^{(n+1)/p}-1$ or its conjugate. As mentioned, there isn’t a guaranteed procedure to find a primitive root, but since there are $\varphi(n^2-1)=\Theta(n^2/\ln\ln n)$ of them, in practice it shouldn’t be too hard to find one; the sticking point is finding the factorization of $n+1$.

Incidentally, the $n+1$ test is often presented using Lucas sequences rather than $\F_{n^2}$. But that’s a topic for another post.

1. This notation means that there is some algebraic integer $k$ such that $qk=a^{(n+1)/p}-1$. However, it is actually only necessary to consider $k\in\Z\lbrack\sqrt{d}\rbrack$, since we can take $a\in\Z\lbrack\sqrt{d}\rbrack$ and $q$ is odd.

# Fun running

Today marks the fourth annual AHS fun run I’ve participated in. I’ve been running routinely for just over 3 years, and in that time have completed about 160 laps of the University of Waterloo’s Ring Road (which is about 2.6 km).

I keep track of all my run times, and it is fascinating watching myself improve. So far, every year during the fun run I beat my previous best time by a comfortable margin, and I don’t match my fun run time until the following summer. These are my best times for each fun run I’ve done:

 Beforehand During 2010 13:53 13:06 2011 12:54 12:29 2012 12:10 11:40 2013 11:24 10:23

I’m a bit amazed I was able to beat my best time by a full minute today; this of course raises the question of why I am able to improve so much during the fun run. Now, there are some physical differences from a typical ring run of mine which might make a difference:

• The start location is different
• The lap direction is opposite (for the last 3 years)
• Instead of the sidewalk, the route is on the road (which is slightly longer, but also flatter)
• There is a ~10 minute warmup beforehand
• The run takes place in the morning, rather than the afternoon or evening

I wish I knew how much each of these factors contributed to the difference, but I suspect the main difference is actually psychological: namely, when I’m running with other people I have more motivation to push myself to run harder. Indeed, currently my legs feel noticeably more tired than usual, and I had a similar story last year.

Since running superficially seems like a purely physical activity, the thought that my psychological state has such a large impact on my performance comes as a surprise to me. It also raises the question: what is the optimal psychological state for running, and what can one do to help promote it?

# Chess problem solution

Today I’ll go through the solution to the chess problem I posted two weeks ago. If you’re still stuck, another hint is that the solution involves multiple lines of play to consider. You can find my solution after the break.

# Announcing Simple Go

As I’ve mentioned briefly, I am an amateur Go player. A big part of the appeal of the game to me is the elegance of the rules. The rules are so simple that as a programmer I almost felt obligated to translate the rules into actual code at some point. In fact, I’ve worked on-and-off on a Go implementation for some months now. The result:

I actually posted this on my website a month ago, but I just added the ability to save games and updated the compiled downloads, so I figure it’s time to officially announce it here.

As far as Go implementations go, it is rather basic; one unique feature that it has is the ability to play random games, i.e., when both players place their stones on the board randomly. I was curious the kinds of patterns that would arise in such games, and how such games would end, assuming that players don’t pass unless absolutely necessary and that no board position can ever repeat (superko). This was another impetus for writing Simple Go, since I couldn’t find any other program which allowed me to try this.

Simple Go was written in C++ using the cross-platform library wxWidgets. The source code is available on GitHub, but pre-compiled binaries are also available on its webpage.

Enjoy – I quite enjoyed writing it.

# Chess problem

Although I prefer playing Go these days, I still enjoy working through a good chess problem. The following is a rather nice one; regrettably, I don’t remember where I found it.

White to move and mate in two.

If you’re stuck, highlight the following for a small hint: A mate in two is no longer possible if Black can pass.

# The conditional prime number theorem

Previously I discussed the prime number theorem and proved Chebyshev’s weak form of it, namely that $\pi(n)=\Theta(n/\ln n)$. Chebyshev was also able to show the conditional result that if

$\lim_{n\to\infty}\frac{\pi(n)}{n/\ln n}$

exists, then its value is $1$. While this might seem to be quite close to the prime number theorem, there is no especially good reason why the limit should exist. In fact, in a sense this result makes the limit less likely to exist; after our previous proof all one knows is that the limit lies between $0.34$ and $4.16$ (supposing it exists), but now it can have only one possible value!

# Some new functions

Before proving this result, it is useful to introduce the von Mangoldt function defined by

$\Lambda(n) := \begin{cases} \ln p & \text{if n\geq2 is a perfect power of the prime p} \\ 0 & \text{otherwise} \end{cases}$

and the Chebyshev function defined as the partial sums of the von Mangoldt function,

$\psi(n) := \sum_{m\leq n}\Lambda(m) .$

For each prime $p$ there are $\lfloor\log_p n\rfloor$ perfect powers of $p$ less than or equal to $n$ and strictly greater than $1$, so an alternate way of writing this function is

$\psi(n) = \sum_{p\leq n}\lfloor\log_p n\rfloor\ln p .$

## Upper bound on $\psi(n)$

By removing the floor and using the logarithm change-of-base formula one gets an upper bound on $\psi(n)$:

$\psi(n) \leq \sum_{p\leq n}(\log_p n)\ln p = \sum_{p\leq n}\ln n = \pi(n)\ln n$

In particular, using the previously established $\pi(n)=O(n/\ln n)$ this implies

$\psi(n) = O(n) . \tag{1}$

## Lower bound on $\psi(n)$

By removing every term in the sum defining $\psi(n)$ except when $m$ is a prime larger than $n’$, and using $\pi(n’)=O(n’/\ln n’)$, one gets a similar lower bound on $\psi(n)$:

\begin{align*}
\psi(n) &\geq \sum_{n'<p\leq n}\ln p \\
&\geq \sum_{n'<p\leq n}\ln n’ \\
&= \ln n’\bigl(\pi(n)-\pi(n’)\bigr) \\
&= \pi(n)\ln n’ + O(n’)
\end{align*}

Now, this holds for all $n'<n$, so take $n’:=n/\ln n$ (it is actually not necessary to ensure that $n’$ is an integer). Then the lower bound becomes

$\psi(n) \geq \pi(n)(\ln n-\ln\ln n) + O\Bigl(\frac{n}{\ln n}\Bigr) = \pi(n)\ln(n) + O\Bigl(\frac{n\ln\ln n}{\ln n}\Bigr) .$

## Combining the bounds on $\psi(n)$

Since the error term here is $o(n)$, combining with the upper bound $\psi(n)\leq\pi(n)\ln n$ we get the nice relation

$\psi(n) = \pi(n)\ln n + o(n) .$

Dividing by $n$, we find that

$\frac{\psi(n)}{n} = \frac{\pi(n)}{n/\ln n} + o(1) . \tag{2}$

This is a very strong asymptotic relation between $\psi(n)/n$ and $\pi(n)/(n/\ln n)$; it says that the difference between these functions goes to zero as $n$ increases. So as $n$ gets large, the functions essentially become identical.

# The first estimation of $\ln(n!)$

The proof now proceeds by estimating the quantity $\ln(n!)$ in two different ways. The first way is a weak form of Stirling’s approximation:

$\ln(n!) = n \ln n + O(n) \tag{3}$

A simple proof of this proceeds by estimating $\newcommand{\d}{\,\mathrm{d}}\int_1^n\ln t\d t$ by rectangles of width $1$:

Note that $\ln(n!)=\sum_{m=1}^\infty \ln m$, so the area formed by the rectangles is exactly the quantity to estimate in (3).

As shown in the diagram, this quantity is closely under-estimated by $\int_1^n\ln t\d t$ (the dark green area); the amount of under-estimation is given by the light green area. Although this is difficult to evaluate exactly, by sliding the light green triangle-like shapes to the right they all fit inside the final rectangle, which has area $\ln n$. Thus the dark green area under-estimates the area of the rectangles by at most $\ln n$, and we have

\begin{align*}
\ln(n!) &= \int_1^n\ln t\d t + O(\ln n) \\
&= \bigl[t\ln n – t\bigr]_1^n + O(\ln n) \\
&= n\ln n \mathbin- n + O(\ln n)
\end{align*}

which is a stronger form of (3).

# The second estimation of $\ln(n!)$

The second method of estimation relies on the prime factorization of $n!$. Recall we had already determined the exponent of $p$ in the prime factorization (also known as the $p$-adic order) of $n!$ to be $\nu_p(n!) := \sum_{k=1}^\infty\lfloor n/p^k\rfloor$. Using this, we have:

\begin{align*}
\ln(n!) &= \ln\Bigl(\prod_{p\leq n}p^{\nu_p(n!)}\Bigr) \\
&= \sum_{p\leq n}\nu_p(n!)\ln p \\
&= \sum_{p^k\leq n}\biggl\lfloor\frac{n}{p^k}\biggr\rfloor\ln p \\
&= \sum_{m\leq n}\Bigl\lfloor\frac{n}{m}\Bigr\rfloor\Lambda(m) \\
&= \sum_{m\leq n}\Bigl(\frac{n}{m} + O(1)\Bigl)\Lambda(m) \\
&= n\sum_{m\leq n}\frac{\Lambda(m)}{m} + O\bigl(\psi(n)\bigr)
\end{align*}

We can evaluate $\sum_{m\leq n}\Lambda(m)/m$ using a trick known as Abel’s summation formula:

\begin{align*}
\sum_{m\leq n}\frac{\Lambda(m)}{m} &= \sum_{m\leq n}\frac{\psi(m)-\psi(m-1)}{m} \\
&= \sum_{m\leq n}\frac{\psi(m)}{m} \mathbin- \sum_{m\leq n-1}\frac{\psi(m)}{m+1} \\
&= \sum_{m\leq n-1}\psi(m)\Bigl(\frac{1}{m}-\frac{1}{m+1}\Bigr) + \frac{\psi(n)}{n} \\
&= \sum_{m\leq n-1}\psi(m)\Bigl[-\frac{1}{t}\Bigr]_m^{m+1} + \frac{\psi(n)}{n} \\
%&= \sum_{m\leq n-1}\psi(m)\int_m^{m+1}\frac{\!\d t}{t^2} + \frac{\psi(n)}{n} \\
&= \sum_{m\leq n-1}\int_m^{m+1}\frac{\psi(t)}{t^2}\!\d t + \frac{\psi(n)}{n} \\
&= \int_1^n\frac{\psi(t)}{t^2}\!\d t + \frac{\psi(n)}{n}
\end{align*}

Most of this is just algebra; the fact that $\psi(m)=\psi(t)$ for $t\in[m,m+1)$ justifies pulling $\psi(m)$ into the integral. Putting these together and using (1), we get our second estimate on $\ln(n!)$:

$\ln(n!) = n\int_1^n\frac{\psi(t)}{t^2}\!\d t + O(n) \tag{4}$

# Putting the estimates together

Combining (3) and (4) and dividing by $n$ we find that

$\int_1^n\frac{\psi(t)}{t^2}\!\d t = \ln n + O(1) . \tag{5}$

# Using (5) to derive the result

Suppose that the limit in question exists and equals $1+\epsilon$ for some $\epsilon>0$. Then, taking the limit as $n\to\infty$ in (2), we have

$\lim_{n\to\infty}\frac{\psi(n)}{n} = \lim_{n\to\infty}\frac{\pi(n)}{n/\ln n} = 1 + \epsilon .$

It follows that there is some $N$ such that $\psi(t)/t\geq 1+\epsilon/2$ for all $t\geq N$. Now, $N$ only depends on the constant $\epsilon$, so taking $N$ fixed as $n\to\infty$ we have
$\int_1^n\frac{\psi(t)}{t^2}\d t \geq \int_N^n\frac{\psi(t)}{t^2}\d t \geq \int_N^n\frac{1+\epsilon/2}{t}\d t = \Bigl(1+\frac{\epsilon}{2}\Bigr)\ln n + O(1) .$

This contradicts (5), since the coefficient on the leading term $\ln n$ is larger than $1$; for example, this implies that $\int_1^n\psi(t)/t^2\d t-\ln n=\Omega(\ln n)$, which is not $O(1)$ as given by (5).

Similarly, if the limit in question exists and equals $1-\epsilon$ for some $\epsilon>0$ then we have $\lim_{n\to\infty}\psi(n)/n = 1-\epsilon$, so there exists some $N$ such that $\psi(t)/t\leq1-\epsilon/2$ for all $t\geq N$. Then as $n\to\infty$ we have

$\int_1^n\frac{\psi(t)}{t^2}\d t \leq O(1) + \int_N^n\frac{1-\epsilon/2}{t}\d t = \Bigl(1-\frac{\epsilon}{2}\Bigr)\ln n + O(1) ,$

which contradicts (5), since this says that the coefficient on the leading term $\ln n$ is less than $1$.

Therefore, we conclude that if $\lim_{n\to\infty}\pi(n)/(n/\ln n)$ exists it cannot be strictly larger than or strictly less than $1$, so it must be exactly $1$.

# A harmonic series variant

Here’s another interesting question regarding the convergence of a series. Does

$\sum_{n=1}^\infty\frac{\cos n}{n}$

converge or diverge?

# Initiating

This is just a quick note to record a notable change in perspective I’ve had recently about initiating conversations. Previously, this is something I would not usually do, even around people I already knew. This lead to a lot of situations with awkward silence, for example if I was standing around with someone waiting for something.

My reasoning process was something along the lines of, “they must not want to talk, or otherwise they would say something”. Since I didn’t want to annoy them, I didn’t say anything. Recently, however, I have been going out of my way to start conversations in such situations. To my surprise, most people do not seem annoyed at all, in fact they seem genuinely pleased.

This stark contrast with my expectation and the reality makes me curious what was mistaken about my previous viewpoint, and how I came to adopt such a perspective in the first place. In retrospect, the obvious objection to my reasoning is that the other person could well be thinking the same thing as me. Perhaps they would even prefer to talk, but dislike the act of initiating; if one accepts this perspective then the initiator is even doing a favour by saying something—exactly the reverse of my previous view.

What could have caused me to get things backwards? It’s not as if there is a lot of conventional wisdom saying you shouldn’t initiate; the closest thing I can think of is feminists complaining about men overzealously initiating women (typically in a dating context). I have to wonder if this was a case where I didn’t want to do something so I rationalized a reason against doing it.

# The intended cute solution

The question I meant to ask on Sunday was whether $\sum_{m,n=1}^\infty1/(m^2+n^2)^2$ converges or diverges, so I’ll give my solution to that problem now. In fact, the methodology closely resembles the divergence proof I gave for the alternate sum, even though this sum converges.

Since all terms in the sum are positive, the sum either converges absolutely or diverges (the sum cannot be conditionally convergent). In either case the terms of the sum may be rearranged without affecting the convergence. For suppose the sum diverges but converges for some rearrangement: since the terms of the rearranged sum are still positive, the rearranged sum would have to converge absolutely, and therefore converge for all rearrangements, contradicting the supposed diverging arrangement.

Therefore we can rearrange the terms as we please; in particular, we can sort the terms in decreasing order, which has the effect of grouping together all terms of the form $1/k^2$ where $k$ is a sum of two squares. The term $1/k^2$ will appear in the sum once for each solution of $m^2+n^2=k$ in positive integers $m$, $n$.1

For our purposes it is sufficient to note there are at most $\sqrt{k}$ solutions to $m^2+n^2=k$. This follows since we know that $1\leq m\leq\sqrt{k}$ and for each value of $m$ there is at most one solution; the only possible value of $n$ which could work is $\sqrt{k-m^2}$.

The argument proceeds as follows:

\newcommand{\N}{\mathbb{N}}\begin{align*} \sum_{m,n\in\N}\frac{1}{(m^2+n^2)^2} &= \sum_{k=2}^\infty\sum_{\substack{m,n\in\N\\m^2+n^2=k}}\frac{1}{k^2} \\ &\leq \sum_{k=2}^\infty\frac{\sqrt{k}}{k^2} \\ &= \sum_{k=2}^\infty\frac{1}{k^{3/2}} \\ &= \zeta(3/2)-1 \end{align*}

The final sum converges since it is a $p$-series (truncated). By the comparison test, the sum in question also converges.

1. The exact number of solutions to $m^2+n^2=k$ is essentially the sum of squares function $r_2(k)$, although since we are exclusively working with solutions in positive numbers the actual number of solutions will be $r_2(k)/4$ if $k$ is not a perfect square and $(r_2(k)-4)/4$ if $k$ is a perfect square.

Via MathWorld, we see that if $r_2(k)$ is nonzero then it is equal to $4B(k)$, where $B(k)$ is the number of divisors of $k$ solely comprised of primes congruent to $1$ mod $4$. In any case, this shows that the maximum number of times $1/k^2$ can appear in the summation is $d(k)$, the number of divisors of $k$. In Apostol (page 296) it is shown that $d(k)=o(k^\epsilon)$ for any $\epsilon>0$, so this is a fairly slowly growing function.