Infinitude of Primes - Euclid’s Proof | Formal Proofs from THE BOOK

Let us jump right in by considering Euclid’s proof of the infinitude of primes as written down by Aigner and Ziegler.

Proof. For any finite set \(\{ p_1, \ldots, p_r \}\) of primes, consider the number \(n = p_1 p_2 \cdots p_r + 1\). This \(n\) has a prime divisor \(p\). But \(p\) is not one of the \(p_i\): otherwise \(p\) would be a divisor of \(n\) and of the product \(p_1 p_2 \cdots p_r\), and thus also of the difference \(n − p_1 p_2 \cdots p_r = 1\), which is impossible. So a finite set \(\{p_1, \ldots , p_r \}\) cannot be the collection of all prime numbers.

Aigner, M., & Ziegler, G. M. (5th ed.). Proofs from THE BOOK, p. 3.

Before going through the proof, it is helpful to think about what exactly this particular version of the proof is establishing. The final conclusion that “a finite set \(\{p_1, \ldots , p_r \}\) cannot be the collection of all prime numbers” suggests that the statement that most directly follows from the given proof is the following.

1 2	theorem infinitude_primes (S : Finset ℕ) (hS : ∀ s ∈ S, Nat.Prime s) : ∃ p, Nat.Prime p ∧ p ∉ S := by

Read literally, this states that given any finite set of integers \(S\) (S : Finset ℕ) s.t. all elements \(s \in S\) are prime (hS : ∀ s ∈ S, Nat.Prime s) there exists some integer \(p\) (∃ p) that is prime (Nat.Prime p) and not in \(S\) (p ∉ S). Based on this goal, let us go through the actual steps of the proof one by one. “Consider the number \(n = p_1 p_2 \cdots p_r + 1\)” is taken care of easily:

1	let n := (∏ i in S, i) + 1

“This \(n\) has a prime divisor \(p\)” is a seemingly innocent statement: if we investigate Nat.exists_prime_and_dvd in mathlib, it does in fact tell us that \(n\) as a natural number has a prime divisor … as long as \(n \neq 1\). So let us establish that this is in fact true in the form of the claim n_ne_one. Here we first invoke Nat.succ_ne_succ, so that instead of \(n \neq 1\) we have to show that \(p_1 p_2 \cdots p_r \neq 0\), then Finset.prod_ne_zero_iff, which says that this is true if and only if none of the \(p_i\) are \(0\). That last part is of course the case as the \(p_i\) are prime (Nat.Prime.ne_zero).

1
2
3

    have n_ne_one : n ≠ 1 := by
      rw [Nat.succ_ne_succ, Finset.prod_ne_zero_iff]
      exact fun s s_in_S ↦ Nat.Prime.ne_zero (hS s s_in_S)

Now we can use n_ne_one as the necessary ingredient to obtain some natural number \(p\) that is prime (p_prime) and divides \(n\) (p_dvd_n) by invoking Nat.exists_prime_and_dvd.

1	obtain ⟨p, p_prime, p_dvd_n⟩ := Nat.exists_prime_and_dvd n_ne_one

“But \(p\) is not one of the \(p_i\)” is the final piece of the puzzle. We will again make this its own little claim p_not_in_S. The proof continues to argue that “otherwise (…) which is impossible”, indicating a proof by contradiction (by_contra). What is the actual contradiction? Well “\(p\) would be a divisor of \(n\)”, which we already know in the form of p_dvd_n, “and of the product \(p_1 p_2 \cdots p_r\)”, which we prove in the form of p_dvd_ProdS, “and thus also of the difference \(n − p_1 p_2 \cdots p_r = 1\)”, which we prove in the form of p_dvd_one. But the final piece of the argument is missing: \(p\) is prime and therefore cannot divide \(1\) (Nat.Prime.not_dvd_one)! We could close the proof in one line by explicitly constructing the desired goal of False through exact Nat.Prime.not_dvd_one p_prime p_dvd_one, but it seems more idiomatic and satisfying to give p_not_dvd_one its own claim and let the contradiction tactic finish up the proof of the claim that \(p\) is not an element of \(\{p_1, \ldots , p_r\}\).

    have p_not_in_S : p ∉ S := by
      by_contra p_in_S
      have p_dvd_ProdS   :   p ∣ (∏ i in S, i)  := Finset.dvd_prod_of_mem (fun i ↦ i) p_in_S
      have p_dvd_one     :   p ∣ 1              := (Nat.dvd_add_right p_dvd_ProdS).mp p_dvd_n
      have p_not_dvd_one :  ¬p ∣ 1              := Nat.Prime.not_dvd_one p_prime
      contradiction

Finally, we are done! We have a natural number \(p\) that is prime (p_prime) and not an element of \(\{p_1, \ldots , p_r \}\) (p_not_in_S). Explicitly giving Lean these ingredients, finishes the proof. 🎉

1	exact ⟨p, p_prime, p_not_in_S⟩

It is interesting to note that mathlib nominally implements the same proof that we just presented here. The major difference is that the chosen variant of the statement of the infinitude of primes, namely that for any natural integer there exists a prime that is larger than that integer, results in some differences that overall make for a more compact proof.

1	theorem exists_infinite_primes (n : ℕ) : ∃ p, n ≤ p ∧ Prime p := ...

In particular, instead of defining (∏ i in S, i) and showing that it is ≠ 0 with Finset.prod_ne_zero_iff (which requires us to argue that Nat.Prime.ne_zero), it makes due with the factorial n ! and simply invoking factorial_pos. Both proofs proceed by explicitly obtaining some prime \(p\), but while we invoked Nat.exists_prime_and_dvd so that we may use p_dvd_n in order to argue that p_not_in_S, the mathlib proof only needs a prime that is greater than \(n\) and therefore chooses minFac_prime and applies dvd_factorial in the proof that \(p \geq n\). This means the proof in mathlib effectively is the same when applying the above proof to \(S = \{1, \ldots, n\}\), which requires us to drop the assumption hS that \(S\) only contain primes. That is of course not an issue: the only time we made use of the fact that the elements of \(S\) are prime is when invoking Nat.Prime.ne_zero in order to establish that \(p_1 \cdots p_r \neq 0\). If we allow \(S\) to contain arbitrary integers, which for Lean includes \(0\), we would first need to filter the set by removing all zeros.