9.6. Type Inference#
OCaml and Java are statically typed languages, meaning every binding has a
type that is determined at compile time—that is, before any part of the
program is executed. The type-checker is a compile-time procedure that either
accepts or rejects a program. By contrast, JavaScript and Ruby are
dynamically-typed languages; the type of a binding is not determined ahead of
time. Computations like binding 42 to x
and then treating x
as a string
therefore either result in run-time errors, or run-time conversion between
types.
Unlike Java, OCaml is implicitly typed, meaning programmers rarely need to write down the types of bindings. This is often convenient, especially with higher-order functions. (Although some people disagree as to whether it makes code easier or harder to read). But implicit typing in no way changes the fact that OCaml is statically typed. Rather, the type-checker has to be more sophisticated because it must infer what the type annotations “would have been” had the programmers written all of them. In principle, type inference and type checking could be separate procedures (the inferencer could figure out the types then the checker could determine whether the program is well-typed), but in practice they are often merged into a single procedure called type reconstruction.
9.6.1. OCaml Type Reconstruction#
At a very high level, OCaml’s type reconstruction algorithm works as follows:
Determine the types of definitions in order, using the types of earlier definitions to infer the types of later ones. (Which is one reason you may not use a name before it is bound in an OCaml program.)
For each
let
definition, analyze the definition to determine constraints about its type. For example, if the inferencer seesx + 1
, it concludes thatx
must have typeint
. It gathers similar constraints for function applications, pattern matches, etc. Think of these constraints as a system of equations like you might have in algebra.Use that system of equations to solve for the type of the name being defined.
The OCaml type reconstruction algorithm attempts to never reject a program that could type check, if the programmer had written down types. It also attempts never to accept a program that cannot possibly type check. Some more obscure parts of the language can sometimes make type annotations either necessary or at least helpful (see Real World OCaml chapter 22, “Type inference”, for examples). But for most code you write, type annotations really are completely optional.
Since it would be verbose to keep writing “the type reconstruction algorithm used by OCaml and other functional languages,” we’ll call the algorithm HM. That name is used throughout the programming languages literature, because the algorithm was independently invented by Roger Hindley and Robin Milner.
HM has been rediscovered many times by many people. Curry used it informally in the 1950s (perhaps even the 1930s). He wrote it up formally in 1967 (published 1969). Hindley discovered it independently in 1969; Morris in 1968; and Milner in 1978. In the realm of logic, similar ideas go back perhaps as far as Tarski in the 1920s. Commenting on this history, Hindley wrote,
There must be a moral to this story of continual re-discovery; perhaps someone along the line should have learned to read. Or someone else learn to write.
Although we haven’t seen the HM algorithm yet, you probably won’t be surprised to learn that it’s usually very efficient—you’ve probably never had to wait for the toplevel to print the inferred types of your programs. In practice, it runs in approximately linear time. But in theory, there are some very strange programs that can cause its running-time to blow up. (Technically, it’s exponential time.) For fun, try typing the following code in utop:
# let b = true;;
# let f0 = fun x -> x + 1;;
# let f = fun x -> if b then f0 else fun y -> x y;;
# let f = fun x -> if b then f else fun y -> x y;;
# let f = fun x -> if b then f else fun y -> x y;;
(* keep repeating that last line *)
You’ll see the types get longer and longer, and eventually (around 20 repetitions or so) type inference will cause a significant delay.
9.6.2. Constraint-Based Inference#
Let’s build up to the HM type inference algorithm by starting with this little language:
e ::= x | i | b | e1 bop e2
| if e1 then e2 else e3
| fun x -> e
| e1 e2
bop ::= + | * | <=
t ::= int | bool | t1 -> t2
That language is SimPL, plus the lambda calculus, minus let
expressions. It
turns out let
expressions add an extra layer of complication, so we’ll come
back to them later.
Since anonymous functions in this language do not have type annotations, we have
to infer the type of the argument x
. For example,
In
fun x -> x + 1
, argumentx
must have typeint
hence the function has typeint -> int
.In
fun x -> if x then 1 else 0
, argumentx
must have typebool
hence the function has typebool -> int
.The function
fun x -> if x then x else 0
is untypeable, because it would requirex
to have both typeint
andbool
, which isn’t allowed.
A Syntactic Simplification. We can treat e1 bop e2
as syntactic sugar for
( bop ) e1 e2
. That is, we treat infix binary operators as prefix function
application. Let’s introduce a new syntactic class n
for names, which
generalize identifiers and operators. That changes the syntax to:
e ::= n | i | b
| if e1 then e2 else e3
| fun x -> e
| e1 e2
n ::= x | bop
bop ::= ( + ) | ( * ) | ( <= )
t ::= int | bool | t1 -> t2
We already know the types of those built-in operators:
( + ) : int -> int -> int
( * ) : int -> int -> int
( <= ) : int -> int -> bool
Those types are given; we don’t have to infer them. They are part of the initial
static environment. In OCaml those operator names could later be shadowed by
values with different types, but here we don’t have to worry about that because
we don’t yet have let
.
How would you mentally infer the type of fun x -> 1 + x
, or rather,
fun x -> ( + ) 1 x
? It’s automatic by now, but we could break it down into
pieces:
Start with
x
having some unknown typet
.Note that
( + )
is known to have typeint -> (int -> int)
.So its first argument must have type
int
. Which1
does.And its second argument must have type
int
, too. Sot = int
. That is a constraint ont
.Finally, the body of the function must also have type
int
, since that’s the return type of( + )
.Therefore, the type of the entire function must be
t -> int
.Since
t = int
, that type isint -> int
.
The type inference algorithm follows the same idea of generating unknown types, collecting constraints on them, and using the constraints to solve for the type of the expression.
Let’s introduce a new quaternary relation env |- e : t -| C
, which should be
read as follows: “in environment env
, expression e
is inferred to have type
t
and generates constraint set C
.” A constraint is an equation of the form
t1 = t2
for any types t1
and t2
.
If we think of the relation as a type-inference function, the colon in the
middle separates the input from the output. The inputs are env
and e
: we
want to know what the type of e
is in environment env
. The function returns
as output a type t
and constraints C
.
The e : t
in the middle of the relation is approximately what you see in the
toplevel: you enter an expression, and it tells you the type. But around that is
an environment and constraint set env |- ... -| C
that is invisible to you.
So, the turnstiles around the outside show the parts of type inference that the
toplevel does not.
The easiest parts of inference are constants:
env |- i : int -| {}
env |- b : bool -| {}
Any integer constant i
, such as 42
, is known to have type int
, and there
are no constraints generated. Likewise for Boolean constants.
Inferring the type of a name requires looking it up in the environment:
env |- n : env(n) -| {}
No constraints are generated.
If the name is not bound in the environment, the expression cannot be typed. It’s an unbound name error.
The remaining rules are at their core the same as the type-checking rules we saw previously, but they each generate a type variable and possibly some constraints on that type variable.
If.
Here’s the rule for if
expressions:
env |- if e1 then e2 else e3 : 't -| C1, C2, C3, t1 = bool, 't = t2, 't = t3
if fresh 't
and env |- e1 : t1 -| C1
and env |- e2 : t2 -| C2
and env |- e3 : t3 -| C3
To infer the type of an if
, we infer the types t1
, t2
, and t3
of each of
its subexpressions, along with any constraints on them. We have no control over
what those types might be; it depends on what the programmer wrote. But we do
know that the type of the guard must be bool
. So we generate a constraint that
t1 = bool
.
Furthermore, we know that both branches must have the same type—though, we
don’t know in advance what that type might be. So, we invent a fresh type
variable 't
to stand for that type. A type variable is fresh if it has never
been used elsewhere during type inference. So, picking a fresh type variable
just means picking a new name that can’t possibly be confused with any other
names in the program. We return 't
as the type of the if
, and we record two
constraints 't = t2
and 't = t3
to say that both branches must have that
type.
We therefore need to add type variables to the syntax of types:
t ::= 'x | int | bool | t1 -> t2
Some example type variables include 'a
, 'foobar
, and 't
. In the last, t
is an identifier, not a meta-variable.
Here’s an example:
{} |- if true then 1 else 0 : 't -| bool = bool, 't = int
{} |- true : bool -| {}
{} |- 1 : int -| {}
{} |- 0 : int -| {}
The full constraint set generated is
{}, {}, {}, bool = bool, 't = int, 't = int
, but of course that simplifies to
just bool = bool, 't = int
. From that constraint set we can see that the type
of if true then 1 else 0
must be int
.
Anonymous functions.
Since there is no type annotation on x
, its type must be inferred:
env |- fun x -> e : 't1 -> t2 -| C
if fresh 't1
and env, x : 't1 |- e : t2 -| C
We introduce a fresh type variable 't1
to stand for the type of x
, and infer
the type of body e
under the environment in which x : 't1
. Wherever x
is
used in e
, that can cause constraints to be generated involving 't1
. Those
constraints will become part of C
.
Here’s a function where we can immediately see that x : bool
, but let’s work
through the inference:
{} |- fun x -> if x then 1 else 0 : 't1 -> 't -| 't1 = bool, 't = int
{}, x : 't1 |- if x then 1 else 0 : 't -| 't1 = bool, 't = int
{}, x : 't1 |- x : 't1 -| {}
{}, x : 't1 |- 1 : int -| {}
{}, x : 't1 |- 0 : int -| {}
The inferred type of the function is 't1 -> 't
, with constraints 't1 = bool
and 't = int
. Simplifying that, the function’s type is bool -> int
.
Function application.
The type of the entire application must be inferred, because we don’t yet know anything about the types of either subexpression:
env |- e1 e2 : 't -| C1, C2, t1 = t2 -> 't
if fresh 't
and env |- e1 : t1 -| C1
and env |- e2 : t2 -| C2
We introduce a fresh type variable 't
for the type of the application
expression. We use inference to determine the types of the subexpressions and
any constraints they happen to generate. We add one new constraint,
t1 = t2 -> 't
, which expresses that the type of the left-hand side e1
must
be a function that takes in an argument of type t2
and returns a value of type
't
.
Let I
be the initial environment that binds the boolean operators. Let’s
infer the type of a partial application of ( + )
:
I |- ( + ) 1 : 't -| int -> int -> int = int -> 't
I |- ( + ) : int -> int -> int -| {}
I |- 1 : int -| {}
From the resulting constraint, we see that
int -> int -> int
=
int -> 't
Stripping the int ->
off the left-hand side of each of those function types,
we are left with
int -> int
=
't
Hence, the type of ( + ) 1
is int -> int
.
9.6.3. Solving Constraints#
What does it mean to solve a set of constraints? Since constraints are equations on types, it’s much like solving a system of equations in algebra. We want to solve for the values of the variables appearing in those equations. By substituting those values for the variables, we should get equations that are identical on both sides. For example, in algebra we might have:
5x + 2y = 9
x - y = -1
Solving that system, we’d get that x = 1
and y = 2
. If we substitute
1
for x
and 2
for y
, we get:
5(1) + 2(2) = 9
1 - 2 = -1
which reduces to
9 = 9
-1 = -1
In programming languages terminology (though perhaps not high-school algebra),
we say that the substitutions {1 / x}
and {2 / y}
together unify that set
of equations, because they make each equation “unite” such that its left side is
identical to its right side.
Solving systems of equations on types is similar. Just as we found numbers to substitute for variables above, we now want to find types to substitute for type variables, and thereby unify the set of equations.
Much like the substitutions we defined before for the substitution model of
evaluation, we’ll write {t / 'x}
for the type substitution that maps type
variable 'x
to type t
. For example, t1 {t2/'x}
means type t1
with t2
substituted for 'x
.
We can define substitution on types as follows:
int {t / 'x} = int
bool {t / 'x} = bool
'x {t / 'x} = t
'y {t / 'x} = 'y
(t1 -> t2) {t / 'x} = (t1 {t / 'x} ) -> (t2 {t / 'x} )
Given two substitutions S1
and S2
, we write S1; S2
to mean the
substitution that is their sequential composition, which is defined as
follows:
t (S1; S2) = (t S1) S2
The order matters. For example, 'x ({('y -> 'y) / 'x}; {bool / 'y})
is
bool -> bool
, not 'y -> 'y
. We can build up bigger and bigger substitutions
this way.
A substitution S
can be applied to a constraint t = t'
. The result
(t = t') S
is defined to be t S = t' S
. So we just apply the substitution on
both sides of the constraint.
Finally, a substitution can be applied to a set C
of constraints; the result
C S
is the result of applying S
to each of the individual constraints in
C
.
A substitution unifies a constraint t_1 = t_2
if t_1 S
results in the same
type as t_2 S
. For example, substitution S = {int -> int / 'y}; {int / 'x}
unifies constraint 'x -> ('x -> int) = int -> 'y
, because
('x -> ('x -> int)) S
=
int -> (int -> int)
and
(int -> 'y) S
=
int -> (int -> int)
A substitution S
unifies a set C
of constraints if S
unifies every
constraint in C
.
At last, we can precisely say what it means to solve a set of constraints: we must find a substitution that unifies the set. That is, we need to find a sequence of maps from type variables to types, such that the sequence causes each equation in the constraint set to “unite”, meaning that its left-hand side and right-hand side become the same.
To find a substitution that unifies constraint set C
, we use an algorithm
unify
, which is defined as follows:
If
C
is the empty set, thenunify(C)
is the empty substitution.If
C
contains at least one constraintt1 = t2
and possibly some other constraintsC'
, thenunify(C)
is defined as follows:If
t1
andt2
are both the same simple type—i.e., both the same type variable'x
, or bothint
or bothbool
—then returnunify(C')
. In this case, the constraint contained no useful information, so we’re tossing it out and continuing.If
t1
is a type variable'x
and'x
does not occur int2
, then letS = {t2 / 'x}
, and returnS; unify(C' S)
. In this case, we are eliminating the variable'x
from the system of equations, much like Gaussian elimination in solving algebraic equations.If
t2
is a type variable'x
and'x
does not occur int1
, then letS = {t1 / 'x}
, and returnS; unify(C' S)
. This is an elimination like the previous case.If
t1 = i1 -> o1
andt2 = i2 -> o2
, wherei1
,i2
,o1
, ando2
are types, thenunify(i1 = i2, o1 = o2, C')
. In this case, we break one constraint down into two smaller constraints and add those constraints back in to be further unified.Otherwise, fail. There is no possible unifier.
In the second and third subcases, the check that 'x
should not occur in the
type ensures that the algorithm is actually eliminating the variable. Otherwise,
the algorithm could end up re-introducing the variable instead of eliminating
it.
It’s possible to prove that the unification algorithm always terminates, and
that it produces a result if and only if a unifier actually exists—that is,
if and only if the set of constraints has a solution. Moreover, the solution the
algorithm produces is the most general unifier, in the sense that if
S = unify(C)
and S'
also unifies C
, then there must exist some S''
such
that S' = S; S''
. Such an S'
is less general than S
because it contains
the additional substitutions of S''
.
9.6.4. Finishing Type Inference#
Let’s review what we’ve done so far. We started with this language:
e ::= n | i | b
| if e1 then e2 else e3
| fun x -> e
| e1 e2
n ::= x | bop
bop ::= ( + ) | ( * ) | ( <= )
t ::= int | bool | t1 -> t2
We then introduced an algorithm for inferring a type of an expression. That type
came along with a set of constraints. The algorithm was expressed in the form of
a relation env |- e : t -| C
.
Next, we introduced the unification algorithm for solving constraint sets. That
algorithm produces as output a sequence S
of substitutions, or it fails. If it
fails, then e
is not typeable.
To finish type inference and reconstruct the type of e
, we just compute t S
.
That is, we apply the solution to the constraints to the type t
produced by
constraint generation.
Let p
be that type. That is, p = t S
. It’s possible to prove p
is the
principal type for the expression, meaning that if e
also has type t
for
any other t
, then there exists a substitution S
such that t = p S
.
For example, the principal type of the identity function fun x -> x
would be
'a -> 'a
. But you could also give that function the less helpful type
int -> int
. What we’re saying is that HM will produce 'a -> 'a
, not
int -> int
. So in a sense, HM actually infers the most “lenient” type that is
possible for an expression.
A Worked Example. Let’s infer the type of the following expression:
fun f -> fun x -> f (( + ) x 1)
It’s not much code, but this will get quite involved!
We start in the initial environment I
that, among other things, maps ( + )
to int -> int -> int
.
I |- fun f -> fun x -> f (( + ) x 1)
For now we leave off the : t -| C
, because that’s the output of constraint
generation. We haven’t figured out the output yet! Since we have a function, we
use the function rule for inference to proceed by introducing a fresh type
variable for the argument:
I |- fun f -> fun x -> f (( + ) x 1)
I, f : 'a |- fun x -> f (( + ) x 1) <-- Here
Again we have a function, hence a fresh type variable:
I |- fun f -> fun x -> f (( + ) x 1)
I, f : 'a |- fun x -> f (( + ) x 1)
I, f : 'a, x : 'b |- f (( + ) x 1) <-- Here
Now we have an application expression. Before dealing with it, we need to descend into its subexpressions. The first one is easy. It’s just a variable. So we finally can finish a judgment with the variable’s type from the environment, and an empty constraint set.
I |- fun f -> fun x -> f (( + ) x 1)
I, f : 'a |- fun x -> f (( + ) x 1)
I, f : 'a, x : 'b |- f (( + ) x 1)
I, f : 'a, x : 'b |- f : 'a -| {} <-- Here
Next is the second subexpression.
I |- fun f -> fun x -> f (( + ) x 1)
I, f : 'a |- fun x -> f (( + ) x 1)
I, f : 'a, x : 'b |- f (( + ) x 1)
I, f : 'a, x : 'b |- f : 'a -| {}
I, f : 'a, x : 'b |- ( + ) x 1 <-- Here
That is another application, so we need to handle its subexpressions. Recall
that ( + ) x 1
is parsed as (( + ) x) 1
. So the first subexpression is the
complicated one to handle.
I |- fun f -> fun x -> f (( + ) x 1)
I, f : 'a |- fun x -> f (( + ) x 1)
I, f : 'a, x : 'b |- f (( + ) x 1)
I, f : 'a, x : 'b |- f : 'a -| {}
I, f : 'a, x : 'b |- ( + ) x 1
I, f : 'a, x : 'b |- ( + ) x <-- Here
Yet another application.
I |- fun f -> fun x -> f (( + ) x 1)
I, f : 'a |- fun x -> f (( + ) x 1)
I, f : 'a, x : 'b |- f (( + ) x 1)
I, f : 'a, x : 'b |- f : 'a -| {}
I, f : 'a, x : 'b |- ( + ) x 1
I, f : 'a, x : 'b |- ( + ) x
I, f : 'a, x : 'b |- ( + ) : int -> int -> int -| {} <-- Here
That one was easy, because we just had to look up the name ( + )
in the
environment. The next is also easy, because we just look up x
.
I |- fun f -> fun x -> f (( + ) x 1)
I, f : 'a |- fun x -> f (( + ) x 1)
I, f : 'a, x : 'b |- f (( + ) x 1)
I, f : 'a, x : 'b |- f : 'a -| {}
I, f : 'a, x : 'b |- ( + ) x 1
I, f : 'a, x : 'b |- ( + ) x
I, f : 'a, x : 'b |- ( + ) : int -> int -> int -| {}
I, f : 'a, x : 'b |- x : 'b -| {} <-- Here
At last, we’re ready to resolve a function application! We introduce a fresh
type variable and add a constraint. The constraint is that the inferred type
int -> int -> int
of the left-hand subexpression must equal the inferred type
'b
of the right-hand subexpression arrow the fresh type variable 'c
, that
is, 'b -> 'c
.
I |- fun f -> fun x -> f (( + ) x 1)
I, f : 'a |- fun x -> f (( + ) x 1)
I, f : 'a, x : 'b |- f (( + ) x 1)
I, f : 'a, x : 'b |- f : 'a -| {}
I, f : 'a, x : 'b |- ( + ) x 1
I, f : 'a, x : 'b |- ( + ) x : 'c -| int -> int -> int = 'b -> 'c <-- Here
I, f : 'a, x : 'b |- ( + ) : int -> int -> int -| {}
I, f : 'a, x : 'b |- x : 'b -| {}
Now we’re ready for the argument being passed to that function.
I |- fun f -> fun x -> f (( + ) x 1)
I, f : 'a |- fun x -> f (( + ) x 1)
I, f : 'a, x : 'b |- f (( + ) x 1)
I, f : 'a, x : 'b |- f : 'a -| {}
I, f : 'a, x : 'b |- ( + ) x 1
I, f : 'a, x : 'b |- ( + ) x : 'c -| int -> int -> int = 'b -> 'c
I, f : 'a, x : 'b |- ( + ) : int -> int -> int -| {}
I, f : 'a, x : 'b |- x : 'b -| {}
I, f : 'a, x : 'b |- 1 : int -| {} <-- Here
Again we can resolve a function application with a new type variable and constraint.
I |- fun f -> fun x -> f (( + ) x 1)
I, f : 'a |- fun x -> f (( + ) x 1)
I, f : 'a, x : 'b |- f (( + ) x 1)
I, f : 'a, x : 'b |- f : 'a -| {}
I, f : 'a, x : 'b |- ( + ) x 1 : 'd -| 'c = int -> 'd, int -> int -> int = 'b -> 'c <-- Here
I, f : 'a, x : 'b |- ( + ) x : 'c -| int -> int -> int = 'b -> 'c
I, f : 'a, x : 'b |- ( + ) : int -> int -> int -| {}
I, f : 'a, x : 'b |- x : 'b -| {}
I, f : 'a, x : 'b |- 1 : int -| {}
And once more, a function application, so a new type variable and a new constraint.
I |- fun f -> fun x -> f (( + ) x 1)
I, f : 'a |- fun x -> f (( + ) x 1)
I, f : 'a, x : 'b |- f (( + ) x 1) : 'e -| 'a = 'd -> 'e, 'c = int -> 'd, int -> int -> int = 'b -> 'c <-- Here
I, f : 'a, x : 'b |- f : 'a -| {}
I, f : 'a, x : 'b |- ( + ) x 1 : 'd -| 'c = int -> 'd, int -> int -> int = 'b -> 'c
I, f : 'a, x : 'b |- ( + ) x : 'c -| int -> int -> int = 'b -> 'c
I, f : 'a, x : 'b |- ( + ) : int -> int -> int -| {}
I, f : 'a, x : 'b |- x : 'b -| {}
I, f : 'a, x : 'b |- 1 : int -| {}
Now we finally get to finish off an anonymous function. Its inferred type
is the fresh type variable 'b
of its parameter x
, arrow the inferred
type e
of its body.
I |- fun f -> fun x -> f (( + ) x 1)
I, f : 'a |- fun x -> f (( + ) x 1) : 'b -> 'e -| 'a = 'd -> 'e, 'c = int -> 'd, int -> int -> int = 'b -> 'c <-- Here
I, f : 'a, x : 'b |- f (( + ) x 1) : 'e -| 'a = 'd -> 'e, 'c = int -> 'd, int -> int -> int = 'b -> 'c
I, f : 'a, x : 'b |- f : 'a -| {}
I, f : 'a, x : 'b |- ( + ) x 1 : 'd -| 'c = int -> 'd, int -> int -> int = 'b -> 'c
I, f : 'a, x : 'b |- ( + ) x : 'c -| int -> int -> int = 'b -> 'c
I, f : 'a, x : 'b |- ( + ) : int -> int -> int -| {}
I, f : 'a, x : 'b |- x : 'b -| {}
I, f : 'a, x : 'b |- 1 : int -| {}
And the last anonymous function can now be complete in the same way:
I |- fun f -> fun x -> f (( + ) x 1) : 'a -> 'b -> 'e -| 'a = 'd -> 'e, 'c = int -> 'd, int -> int -> int = 'b -> 'c <-- Here
I, f : 'a |- fun x -> f (( + ) x 1) : 'b -> 'e -| 'a = 'd -> 'e, 'c = int -> 'd, int -> int -> int = 'b -> 'c
I, f : 'a, x : 'b |- f (( + ) x 1) : 'e -| 'a = 'd -> 'e, 'c = int -> 'd, int -> int -> int = 'b -> 'c
I, f : 'a, x : 'b |- f : 'a -| {}
I, f : 'a, x : 'b |- ( + ) x 1 : 'd -| 'c = int -> 'd, int -> int -> int = 'b -> 'c
I, f : 'a, x : 'b |- ( + ) x : 'c -| int -> int -> int = 'b -> 'c
I, f : 'a, x : 'b |- ( + ) : int -> int -> int -| {}
I, f : 'a, x : 'b |- x : 'b -| {}
I, f : 'a, x : 'b |- 1 : int -| {}
As a result of constraint generation, we know that the type of the expression is
'a -> 'b -> 'e
, where
'a = 'd -> 'e
'c = int -> 'd
int -> int -> int = 'b -> 'c
To solve that system of equations, we use the unification algorithm:
unify('a = 'd -> 'e, 'c = int -> 'd, int -> int -> int = 'b -> 'c)
The first constraint yields a substitution {('d -> 'e) / 'a}
, which we record
as part of the solution, and also apply it to the remaining constraints:
...
=
{('d -> 'e) / 'a}; unify(('c = int -> 'd, int -> int -> int = 'b -> 'c) {('d -> 'e) / 'a})
=
{('d -> 'e) / 'a}; unify('c = int -> 'd, int -> int -> int = 'b -> 'c)
The second constraint behaves similarly to the first:
...
=
{('d -> 'e) / 'a}; {(int -> 'd) / 'c}; unify((int -> int -> int = 'b -> 'c) {(int -> 'd) / 'c})
=
{('d -> 'e) / 'a}; {(int -> 'd) / 'c}; unify(int -> int -> int = 'b -> int -> 'd)
The function constraint breaks down into two smaller constraints:
...
=
{('d -> 'e) / 'a}; {(int -> 'd) / 'c}; unify(int = 'b, int -> int = int -> 'd)
We get another substitution:
...
=
{('d -> 'e) / 'a}; {(int -> 'd) / 'c}; {int / 'b}; unify((int -> int = int -> 'd) {int / 'b})
=
{('d -> 'e) / 'a}; {(int -> 'd) / 'c}; {int / 'b}; unify(int -> int = int -> 'd)
Then we get to break down another function constraint:
...
=
{('d -> 'e) / 'a}; {(int -> 'd) / 'c}; {int / 'b}; unify(int = int, int = 'd)
The first of the resulting new constraints is trivial and just gets dropped:
...
=
{('d -> 'e) / 'a}; {(int -> 'd) / 'c}; {int / 'b}; unify(int = 'd)
The very last constraint gives us one more substitution:
=
{('d -> 'e) / 'a}; {(int -> 'd) / 'c}; {int / 'b}; {int / 'd}
To finish, we apply the substitution output by unification to the type inferred by constraint generation:
('a -> 'b -> 'e) {('d -> 'e) / 'a}; {(int -> 'd) / 'c}; {int / 'b}; {int / 'd}
=
(('d -> 'e) -> 'b -> 'e) {(int -> 'd) / 'c}; {int / 'b}; {int / 'd}
=
(('d -> 'e) -> 'b -> 'e) {int / 'b}; {int / 'd}
=
(('d -> 'e) -> int -> 'e) {int / 'd}
=
(int -> 'e) -> int -> 'e
And indeed that is the same type that OCaml would infer for the original expression:
# fun f -> fun x -> f (( + ) x 1);;
- : (int -> 'a) -> int -> 'a = <fun>
Except that OCaml uses a different type variable identifier. OCaml is nice to us and “lowers” the type variables down to smaller letters of the alphabet. We could do that too with a little extra work.
Type Errors. In reality there is yet another piece to type inference. If unification fails, the compiler or interpreter needs to produce a helpful error message. That’s an important engineering challenge that we won’t address here. It requires keeping track of more than just constraints: we need to know why a constraint was introduced, and the ramification of its violation. We also need to track the constraint back to the lexical piece of code that produced it, so that programmers can see where the problem occurs. And since it’s possible that constraints can be processed in many different orders, there are many possible error messages that could be produced. Figuring out which one will lead the programmer to the root cause of an error, instead of some downstream consequence of it, is an area of ongoing research.
9.6.5. Let Polymorphism#
Now we’ll add let
expressions to our little language:
e ::= x | i | b | e1 bop e2
| if e1 then e2 else e3
| fun x -> e
| e1 e2
| let x = e1 in e2 (* new *)
It turns out type inference for them is considerably trickier than might be expected. The naive approach would be to add this constraint generation rule:
env |- let x = e1 in e2 : t2 -| C1, C2
if env |- e1 : t1 -| C1
and env, x : t1 |- e2 : t2 -| C2
From the type-checking perspective, that’s the same rule we’ve always used.
And for many let
expressions it works perfectly fine. For example:
{} |- let x = 42 in x : int -| {}
{} |- 42 : int -| {}
x : int |- x : int -| {}
The problem is that when the value being bound is a polymorphic function, that rule generates constraints that are too restrictive. For example, consider the identity function:
let id = fun x -> x in
let a = id 0 in
id true
OCaml has no trouble inferring the type of id
as 'a -> 'a
and permitting it
to be applied both to an int
and a bool
. But the rule above isn’t so
permissive about application to both types. When we use it, we generate the
following types and constraints:
{} |- let id = fun x -> x in (let a = id 0 in id true) : 'c -| 'a -> 'a = int -> 'b, 'a -> 'a = bool -> 'c
{} |- fun x -> x : 'a -| {}
x : 'a |- x : 'a -| {}
id : 'a -> 'a |- let a = id 0 in id true : 'c -| 'a -> 'a = int -> 'b, 'a -> 'a = bool -> 'c <--- POINT 1
id : 'a -> 'a |- id 0 : 'b -| 'a -> 'a = int -> 'b
id : 'a -> 'a |- id : 'a -> 'a -| {}
id : 'a -> 'a |- 0 : int -| {}
id : 'a -> 'a, a : 'b |- id true : 'c -| 'a -> 'a = bool -> 'c <--- POINT 2
id : 'a -> 'a, a : 'b |- id : 'a -> 'a -| {}
id : 'a -> 'a, a : 'b |- true : bool -| {}
Notice that we do infer a type 'a -> 'a
for id
, which you can see in the
environment in later lines of the example. But, at Point 1, we infer a
constraint 'a -> 'a = int -> 'b
, and at Point 2, we infer
'a -> 'a = bool -> 'c
. When the unification algorithm encounters those
constraints, it will break them down into 'a = int
, ‘a = 'b
, 'a = bool
,
and 'a = 'c
. The first and third of those are contradictory, because we can’t
have 'a = int
and 'a = bool
. One or the other will be substituted away
during unification, leaving an unsatisfiable constraint int = bool
. At that
point unification will fail, declaring the program to be ill-typed.
The problem is that the 'a
type variable in the inferred type of id
stands
for an unknown but fixed type. At each application of id
, we want to let
'a
become a different type, instead of forcing it to always be the same
type.
The solution to the problem of polymorphism for let
expressions is not simple.
It requires us to introduce a new kind of type: a type scheme. Type schemes
resemble universal quantification from mathematical logic. For example, in
logic you might write, “for all natural numbers \(x\), it holds that \(0 \cdot x
= 0\)”. The “for all” is the universal quantification: it abstracts away from a
particular \(x\) and states a property that is true of all natural numbers.
A type scheme is written 'a . t
, where 'a
is a type variable and t
is a
type in which 'a
may appear. For example, 'a . 'a -> 'a
is a type scheme. It
is the type of a function that takes in a value of type 'a
and returns a value
of type 'a
, for all 'a
. Thus, it is the type of the polymorphic identity
function.
We can also have many type variables to the left of the dot in a type scheme.
For example, 'a 'b . 'a -> 'b -> 'a
is the type of a function that takes in
two arguments and returns the first. In OCaml, we could write that as
fun x y -> x
. Note that utop infers the type of it as we would expect:
# let f = fun x y -> x;;
val f : 'a -> 'b -> 'a = <fun>
But we could actually manually write down an annotation with a type scheme:
# let f : 'a 'b . 'a -> 'b -> 'a = fun x y -> x;;
val f : 'a -> 'b -> 'a = <fun>
Note that OCaml accepts our manual type annotation but doesn’t include the
'a 'b .
part of it in its output. But it’s implicitly there and always has
been. In general, anytime OCaml has inferred a type t
and that type has had
type variables in it, in reality it’s a type scheme. For example, the type of
List.length
is really a type scheme:
# let mylen : 'a . 'a list -> int = List.length;;
val mylen : 'a list -> int = <fun>
OCaml just doesn’t bother outputting the list of type variables that are to the left of the dot in the type scheme. Really they’d just clutter the output, and many programmers never need to know about them. But now that you’re learning type inference, it’s time for you to know.
Now that we have type schemes, we’ll have static environments that map names to
type schemes. We can think of types as being special cases of type schemes in
which the list of type variables is empty. With type schemes, the let
rule
changes in only one way from the naive rule above, which is the generalize
on
the last line:
env |- let x = e1 in e2 : t2 -| C1, C2
if env |- e1 : t1 -| C1
and generalize(C1, env, x : t1) |- e2 : t2 -| C2
The job of generalize
is to take a type like 'a -> 'a
and generalize it
into a type scheme like 'a . 'a -> 'a
in an environment env
against
constraints C1
. Let’s come back to how it works in a minute. Before that,
there’s one other rule that needs to change, which is the name rule:
env |- n : instantiate(env(n)) -| {}
The only thing that changes there is that use of instantiate
. Its job is to
take a type scheme like 'a . 'a -> 'a
and instantiate it into a new type
(and here we strictly mean a type, not a type scheme) with fresh type variables.
For example, 'a . 'a -> 'a
could be instantiated as 'b -> 'b
, if 'b
isn’t
yet in use anywhere else as a type variable.
Here’s how those two revised rules work together to get our earlier example with the identify function right:
{} |- let id = fun x -> x in (let a = id 0 in id true)
{} |- fun x -> x : 'a -> 'a -| {}
x : 'a |- x : 'a -| {}
id : 'a . 'a -> 'a |- let a = id 0 in id true <--- POINT 1
Let’s pause there at Point 1. When id
is put into the environment by the let
rule, its type is generalized from 'a -> 'a
to 'a . 'a -> 'a
; that is, from
a type to a type scheme. That records the fact that each application of id
should get to use its own value for 'a
. Going on:
{} |- let id = fun x -> x in (let a = id 0 in id true)
{} |- fun x -> x : 'a -> 'a -| {}
x : 'a |- x : 'a -| {}
id : 'a . 'a -> 'a |- let a = id 0 in id true <--- POINT 1
id : 'a . 'a -> 'a |- id 0
id : 'a . 'a -> 'a |- id : 'b -> 'b -| {} <--- POINT 3
Pausing here at Point 3, when id
is applied to 0
, we instantiate its type
variable 'a
with a fresh type variable 'b
. Let’s finish:
{} |- let id = fun x -> x in (let a = id 0 in id true) : 'e -| 'b -> 'b = int -> 'c, 'd -> 'd = bool -> 'e
{} |- fun x -> x : 'a -> 'a -| {}
x : 'a |- x : 'a -| {}
id : 'a . 'a -> 'a |- let a = id 0 in id true : 'e -| 'b -> 'b = int -> 'c, 'd -> 'd = bool -> 'e <--- POINT 1
id : 'a . 'a -> 'a |- id 0 : 'c -| 'b -> 'b = int -> 'c
id : 'a . 'a -> 'a |- id : 'b -> 'b -| {} <--- POINT 3
id : 'a . 'a -> 'a |- 0 : int -| {}
id : 'a . 'a -> 'a, a : 'b |- id true : 'e -| 'd -> 'd = bool -> 'e <--- POINT 2
id : 'a . 'a -> 'a, a : 'b |- id : 'd -> 'd -| {} <--- POINT 4
id : 'a . 'a -> 'a, a : 'b |- true : bool -| {}
At Point 4, when id
is applied to true
, we again instantiate its type
variable 'a
with a fresh type variable, this time 'd
. So the constraints
collected at Points 1 and 2 are no longer contradictory, because they are
talking about different type variables. Those constraints are:
'b -> 'b = int -> 'c
'd -> 'd = bool -> 'e
The unification algorithm will therefore conclude:
'b = int
'c = int
'd = bool
'e = bool
So the entire expression is successfully inferred to have type bool
.
Instantiation and Generalization. We used two new functions, instantiate
and generalize
, to define type inference for let
expressions. We need to
define those functions.
The easy one is instantiate
. Given a type scheme 'a1 'a2 ... 'an . t
,
we instantiate it by:
choosing
n
fresh type variables, andsubstituting each of those for
'a1
through'an
int
.
Substitution is uncomplicated here, compared to how it was for evaluation in the substitution model, because there is nothing in a type that can bind variable names.
But generalize
requires more work. Here’s the let
rule again:
env |- let x = e1 in e2 : t2 -| C1, C2
if env |- e1 : t1 -| C1
and generalize (C1, env, x : t1) |- e2 : t2 -| C2
To generalize t1
, we do the following.
First, we pretend like e1
is all that matters, and that the rest of the let
expression doesn’t exist. If e1
were the entire program, how would we finish
type inference? We’d run the unification algorithm on C1
, get a substitution
S
, and return t1 S
as the inferred type of e1
. So, do that now. Let’s call
that inferred type u1
. Let’s also apply S
to env
to get a new environment
env1
, which now reflects all the type information we’ve gleaned from e1
.
Second, we figure out which type variables in u1
should be generalized. Why
not all of them? Because some type variables could have been introduced by code
that surrounds the let
expression, e.g.,
fun x ->
(let y = e1 in e2) (let z = e3 in e4)
The type variable for x
should not be generalized in inferring the type of
either y
or z
, because x
has to have the same type in all four
subexpressions, e1
through e4
. Generalizing could mistakenly allow x
to
have one type in e1
and e2
, but a different type in e3
and e4
.
So instead we generalize only variables that are in u1
but are not in
env1
. That way we generalize only the type variables from e1
, not variables
that were already in the environment when we started inferring the let
expression’s type. Suppose those variables are 'a1 ... 'an
. The type scheme we
give to x
is then 'a1 ... 'an . u1
.
Putting all that together, we end up with:
generalize(C1, env, x : t1) =
env1, x : 'a1 ... 'an . u1
Returning to our example with the identify function from above, we had
generalize({}, {}, x : 'a -> 'a)
. In that rather simple case, unify
discovers no new equalities from the environment, so u1 = 'a -> 'a
and
env1 = {}
. The only type variable in u1
is 'a
, and it doesn’t appear in
env1
. So 'a
is generalized, yielding 'a . 'a -> 'a
as the type scheme for
id
.
9.6.6. Polymorphism and Mutability#
There is yet one more complication to type inference for let
expressions. It
appears when we add mutable references to the language. Consider this example
code, which does not type check in OCaml:
let succ = fun x -> ( + ) 1 x;;
let id = fun x -> x;;
let r = ref id;;
r := succ;;
!r true;; (* error *)
It’s clear we should infer succ : int -> int
and id : 'a . 'a -> 'a
. But
what should the type of r
be? It’s tempting to say we should infer
r : 'a . ('a -> 'a) ref
. That would let us instantiate the type of r
to be
(int -> int) ref
on line 4 and store succ
in r
. But it also would let us
instantiate the type of r
to be (bool -> bool) ref
on line 5. That’s a
disaster: it causes the application of succ
to true
, which is not type safe.
The solution adopted by OCaml and related languages is called the value restriction: the type system is designed to prevent a polymorphic mutable value from ever holding more than one type. Let’s redo some of that example again, pausing to look at the toplevel output:
# let id = fun x -> x;;
val id : 'a -> 'a = <fun> (* as expected *)
# let r = ref id;;
val r : ('_weak1 -> '_weak1) ref = { ... } (* what is _weak? *)
# r;;
- : ('_weak1 -> '_weak1) ref = { ... } (* it's consistent at least *)
# r := succ;;
- : unit = ()
# r;;
- : (int -> int) ref = { ... } (* did r just change type ?! *)
When the type of r
is inferred, OCaml gives it a type involving a weak type
variable. All such variables have a name starting with '_weak
. A weak type
variable is one that has not been generalized hence cannot be instantiated on
multiple types. Rather, it indicates a single type that is not yet known. Think
of it as type inference for that variable is not yet finished: OCaml is waiting
for more information to pin down precisely what it is. When r := succ
is
executed, that information finally becomes available. OCaml infers that
'_weak1 = int
from the type of succ
. Then OCaml replaces '_weak1
with
int
everywhere. That’s what yields an error on the final line:
# !r true;;
Error: This expression has type bool but an expression was expected of type int
Since r : (int -> int) ref
, we cannot apply !r
to a bool
.
We won’t cover the implementation of weak type variables here.
But, let’s not leave this topic of the interaction between polymorphic types and mutability yet. You might be tempted to think that it’s a phenomenon that affects only OCaml. But indeed, even Java suffers.
Consider the following class hierarchy:
class Animal { }
class Elephant extends Animal { }
class Rabbit extends Animal { }
Now suppose we create an array of animals:
Animal[] a= new Rabbit[2]
Here we are using subtype polymorphism to assign an array of Rabbit
objects
to an Animal[]
reference. That’s not the same as parametric polymorphism as
we’ve been using in OCaml, but it’s nonetheless polymorphism.
What if we try this?
a[0]= new Elephant()
Since a
is typed as an Animal
array, it stands to reason that we could
assign an elephant object into it, just as we could assign a rabbit object. And
indeed that code is fine according to the Java compiler. But Java gives us a
runtime error if we run that code!
Exception java.lang.ArrayStoreException
The problem is that mutating the first array element to be a rabbit would leave
us with a Rabbit
array in which one element is a Elephant
. (Ouch! An
elephant would sit on a rabbit. Poor bun bun.) But in Java, the type of every
object of an array is supposed to be a property of the array as a whole. Every
element of the array created by new Rabbit[2]
therefore must be a Rabbit
. So
Java prevents the assignment above by detecting the error at run time and
raising an exception.
This is really the value restriction in another guise! The type of a value stored in a mutable location may not change, according to the value restriction. With arrays, Java implements that with a run-time check, instead of rejecting the program at compile time. This strikes a balance between soundness (preventing errors from happening) and expressivity (allowing more error-free programs to type check).