Getting used to types
Acknowledgements. Theorem Proving in Lean 4 by Jeremy Avigad, Leonardo de Moura, Soonho Kong and Sebastian Ullrich gives an excellent formal introduction to dependent type theory and how it’s used in Lean. This post mostly covers some of the topics from sections 2 and 3. I would also like to thank Aldo Kiem for making me aware of the example using the category of sets.
Having just seen our first real proofs, it’s perhaps a good moment to go through some of the fundamentals of Lean, at least on a very superficial level. This is inevitable, even if our goal is to focus on the math and stay as close as possible in language to the ‘human’ version of the proof. Perhaps the most important aspect of Lean, that is also the most counterintuitive for set theoretically educated mathematicians, is its reliance on (dependent) type theory as its underlying language principle.
A programmer’s perspective
From a programming perspective, statements like theorem infinitude_primes (n : ℕ) : ∃ p, Nat.Prime p ∧ p > n := ...
very much work like a method or function: for a given input n
, infinitude_primes
gives us ∃ p, Nat.Prime p ∧ p > n
as an output.1 Viewed like this, specifying some type (in this case ℕ
) for the variable n
is not surprising: any strongly typed programming language will expect you to specify what type to expect from an input variable (and will throw an error if a function gets the wrong input). For example, a basic Java method counting the number of occurrences of a character in a string
1 2 3 |
public int countChar(String input, char charToCount) { return (int) input.chars().filter(ch -> ch == charToCount).count(); } |
explicitly specifies that input
is an instance of type String
2 and charToCount
of type char
while also specifying that the output will be of type int
. Even the dynamically typed Python started supporting type hints with version 3.5, so the equivalent Python method
1 2 |
def count_char(input_str: str, char_to_count: str) -> int: return input_str.count(char_to_count) |
suggests (mostly to type checkers, IDEs, and linters) that input_str
and char_to_count
should both be of type str
and the output of type int
. We can in fact define exactly the same function in Lean and note that it looks remarkably similar, albeit with syntax that is more tailored to mathematical expressions.
1 2 |
def count_char (input : String) (char_to_count : Char) : ℕ := input.foldl (fun acc c => if c == char_to_count then acc + 1 else acc) 0 |
Running #eval count_char "banana" 'a'
and opening the message pane in the Lean Infoview tells us that the answer is in fact 3
.3
A mathematician’s perspective
Another perspective of type theory views it as a way around some of the common pitfalls that come with too heavily relying on set-theoretic notation. A mathematician may view the statements “For any natural number \(n\) …” and “For any \(n \in \mathbb{N}\) …” as absolutely interchangeable, but they are not: the former states that you should assume that you are given some (for lack of a better word) object \(n\) along with certain properties that we summarize under the term ‘natural number’. The latter relies on having previously defined \(\mathbb{N}\) as the set of all objects that are natural numbers, having made sure to only use the axioms of ZF(C), and are now given an arbitrary member of that set. The outcome is the same, but “For any \(n \in \mathbb{N}\) …” took some extra steps to get there.
It is perhaps unsurprising that Lean would much rather you give it a variable with a specified (and well-defined) type instead of a claim of membership in a countably infinite set, which is why we see n : ℕ
and not n ∈ ℕ
. Note that ℕ
actually denotes the type (and not the set) of natural numbers in Lean, which we can easily verify by having #check ℕ
tells us that ℕ : Type
. If we wanted to have the actual set of natural numbers, we would need to somewhat clumsily define it through def NatSet : Set ℕ := Set.univ
. The fact that 2
is of type ℕ
is tautological in Lean, the fact that 2 ∈ NatSet
requires a proof in the form of Set.mem_univ 2
, which simply falls back to checking that 2
is in fact of type ℕ
.
If this seems slightly academic at this point, note that we very rapidly approach some Russell’s paradox level problems if we too carelessly use set theoretic ideas. We may for example be tempted to construct the universal set when defining the objects of the category of sets, contradicting the axiom of regularity and axiom of pairing. Category theorists seemingly get around this by talking about a collection of objects4, Lean simply uses types: if you give it anything of the correct type, it will happily accept it as an object of that category, without having to reflect on its opinion on set theoretic axioms.
What are proofs?
Let us take a step back and revisit our claim that theorem infinitude_primes (n : ℕ) : ∃ p, Nat.Prime p ∧ p > n := ...
is just a function where, as we would in any other programming language, we are specifying types for the input and output. Where did we specify the type of the output? Where is the actual implementation of the function that returns something of the correct type? How does the proof fit into this? The answer is that ∃ p, Nat.Prime p ∧ p > n
specifies the type that the output has to have, just like ℕ
did for our little count_char
example, and the proof that is provided after :=
is the implementation of infinitude_primes
that returns an output of type ∃ p, Nat.Prime p ∧ p > n
.
This is somewhat wild to think about from a programming perspective: there are many functions that will return something of a given type and we would hardly ever consider two functions that both return an int
from a string
input as “basically the same up to some minor coding details”. The implementation is clearly where the actually relevant stuff happens and it matters if the function counts the length of a string or the number of occurrences of some character in it. In Lean, however, the type of the output is everything, stating exactly what the point of a lemma or theorem is. While there are certainly many different implementations (proofs) returning an output of the specified type, and we will in fact explore several different proofs of the infinitude of primes, they all ultimately serve the same purpose of defining an arbitrary instance of the specified output type to let us know that the proposition holds and can be used in other proofs.
Based on what we just learned, you might have just run #check ∃ p, Nat.Prime p ∧ p > n
(after first defining some variable (n : ℕ)
) and expected the same output that we previously saw from #check ℕ
. But instead of being told that ∃ p, Nat.Prime p ∧ p > n : Type
, we are being told ∃ p, Nat.Prime p ∧ p > n : Prop
. Were we lying when we said infinitude_primes
simply specifies its desired output type, the same way that count_char
did? No, we instead just stumbled upon what makes Lean’s dependent type theory unique: types themselves are objects and there is an infinite hierarchy of types with Prop
taking up a special place at the bottom that makes the above observation slightly less head spinning.
Types all the way down
Let’s switch to an example that is slightly easier to manage so we can explore how exactly Prop
fits into this world of types, namely example : Nat.Prime 2 := of_decide_eq_true (Eq.refl true)
. Running #check (Nat.Prime 2)
shows that it is likewise of type Prop
and running #check Prop
reveals that Prop
is of type Type
. What happens when we #check Type
? We reveal that Type
itself is just of type Type 1
and we have discovered the infinite hierarchy of types. Prop
sits at the very bottom of this hierarchy and every proposition p : Prop
falls into one of two camps: its corresponding type is empty and the proposition therefore false, or it is inhabited meaning we can construct a term t : p
and the proposition is true.5
The table below summarizes some of the previously seen types and their respective level in this hierarchy. Moving from column to column in this table is referred to as a change in universe.
– | – | Sort 0 | Sort 1 | Sort 2 |
---|---|---|---|---|
trivial | True | Prop | Type | Type 1 |
of_decide_eq_true (Eq.refl true) | Nat.Prime 2 | |||
Nat.prime_two | Nat.Prime 2 | |||
– | 2 | ℕ | ||
– | (2 : Fin 3) | Fin 3 | ||
– | true | Bool | ||
– | λ x ↦ x + 1 | ℕ → ℕ | ||
– | – | λ n ↦ Fin n | ℕ → Type | |
– | – | Fin | ||
– | – | Finset.{0} | Type → Type |
This hierarchy also reveals that operators like →
and ×
allow us to construct new types from other types like ℕ
, ℤ
, String
, and Char
; →
for example constructs the type of all functions from objects of one type to objects of another and ×
constructs the type of pairs of objects with appropriate type. Note that these can be chained, so ℕ → ℕ → ℕ
refers to ℕ → (ℕ → ℕ)
and ℕ × ℕ → ℕ
to (ℕ → ℕ) → ℕ
. We can also align this table by writing the first entry of each row, called the term, on the left, followed by its type and its sort (the type of its type). Moving from column to column in this table is referred to as a change in degree. Note that in some cases we can even specify the universe u
of the Type from which we are deriving our new type (this is what the .{0}
was doing in Finset.{0}
).
term | type | sort |
---|---|---|
trivial | True | Prop |
of_decide_eq_true (Eq.refl true) | Nat.Prime 2 | |
Nat.prime_two | Nat.Prime 2 | |
2 | ℕ | Type |
(2 : Fin 3) | Fin 3 | |
true | Bool | |
λ x ↦ x + 1 | ℕ → ℕ | |
λ n ↦ Fin n | ℕ → Type | Type 1 |
Finset.{u} | Type u → Type u | Type (u+1) |
I’d recommend experimenting with Lean at this point and exploring the existing types and how they are implemented in mathlib. We’ll cover additional aspects of type theory in future posts, but for now I’ll leave you with the following takeaway from Theorem Proving in Lean 4.
p : Prop
. To prove that assertion, we need to exhibit a term t : p
. Lean's task, as a proof assistant, is to help us to construct such a term, t
, and to verify that it is well-formed and has the correct type.
Theorem Proving in Lean 4 by Jeremy Avigad, Leonardo de Moura, Soonho Kong and Sebastian Ullrich
-
Note that we can interpret everything to the left of
:
as being quantified by a “for all”. In fact, we could also move the statements entirely to the right and quantify them by an explicit∀
:lemma h₁ (n : ℕ) : n + 1 = n.succ := rfl
andlemma h₂ : ∀ n : ℕ, n + 1 = n.succ := λ _ ↦ rfl
are identical to Lean, but the former is cleaner and therefore currently the recommended form. ↩ -
Technically
String
is a class and not a primitive type in Java andinput
therefore an object. ↩ -
Note that
theorem
is just a versiondef
and Lean will not distinguish between them when type checking, but there are differences in behavior andtheorem
is usually the correct choice when the output type is of typeProp
anddef
if it is of typeType
. ↩ -
See for example the definition of a category on page 3 of Emily Riehl’s Category Theory in Context ↩
-
In fact, Lean will not distinguish between
t₁ : p
andt₂ : p
ifp : Prop
, a principle referred to as proof irrelevance. ↩