# 9.5. Type Checking¶

Earlier, we skipped over the type checking phase. Let’s come back to that now. After lexing and parsing, the next phase of compilation is semantic analysis, and the primary task of semantic analysis is type checking.

A *type system* is a mathematical description of how to determine whether an
expression is *ill typed* or *well typed*, and in the latter case, what the type
of the expression is. A *type checker* is a program that implements a type
system, i.e., that implements the static semantics of the language.

Commonly, a type system is formulated as a ternary relation
\(\mathit{HasType}(\Gamma, e, t)\), which means that expression \(e\) has type \(t\)
in typing context \(\Gamma\). A *typing context*, aka *typing environment*, is a
map from identifiers to types. The context is used to record what variables are
in scope, and what their types are. The use of the Greek letter \(\Gamma\) for
contexts is traditional.

That ternary relation \(\mathit{HasType}\) is typically written with infix notation, though, as \(\Gamma \vdash e : t\). You can read the turnstile symbol \(\vdash\) as “proves” or “shows”, i.e., the context \(\Gamma\) shows that \(e\) has type \(t\).

Let’s make that notation a little friendlier by eliminating the Greek and the
math typesetting. We’ll just write `ctx |- e : t`

to mean that typing context
`ctx`

shows that `e`

has type `t`

. Let’s write `{}`

for the empty context, and
`x:t`

to mean that `x`

is bound to `t`

. So, `{foo:int, bar:bool}`

would be the
context is which `foo`

has type `int`

and `bar`

has type `bool`

. A context may
bind an identifier at most once. We’ll write `ctx[x -> t]`

to mean a context
that contains all the bindings of `ctx`

, and also binds `x`

to `t`

. If `x`

was
already bound in `ctx`

, then that old binding is replaced by the new binding to
`t`

in `ctx[x -> t]`

.

With all that machinery, we can at last define what it means to be well typed:
An expression `e`

is **well typed** in context `ctx`

if there exists a type `t`

for which `ctx |- e : t`

. The goal of a type checker is thus to find such a type
`t`

, starting from some initial context.

It’s convenient to pretend that the initial context is empty. But in practice,
it’s rare that a language truly uses the empty context to determine whether a
program is well typed. In OCaml, for example, there are many built-in
identifiers that are always in scope, such as everything in the `Stdlib`

module.

## 9.5.1. A Type System for SimPL¶

Recall the syntax of SimPL:

```
e ::= x | i | b | e1 bop e2
| if e1 then e2 else e3
| let x = e1 in e2
bop ::= + | * | <=
```

Let’s define a type system `ctx |- e : t`

for SimPL. The only types in SimPL are
integers and booleans:

```
t ::= int | bool
```

To define `|-`

, we’ll invent a set of *typing rules* that specify what the type
of an expression is based on the types of its subexpressions. In other words,
`|-`

is an *inductively-defined relation*, as can be learned about in a discrete
math course. So, it has some base cases, and some inductive cases.

For the base cases, an integer constant has type `int`

in any context
whatsoever, a Boolean constant likewise always has type `bool`

, and a variable
has whatever type the context says it should have. Here are the typing rules
that express those ideas:

```
ctx |- i : int
ctx |- b : bool
{x : t, ...} |- x : t
```

The remaining syntactic forms are inductive cases.

**Let.** As we already know from OCaml, we type check the body of a let
expression using a scope that is extended with a new binding.

```
ctx |- let x = e1 in e2 : t2
if ctx |- e1 : t1
and ctx[x -> t1] |- e2 : t2
```

The rule says that `let x = e1 in e2`

has type `t2`

in context `ctx`

, but only
if certain conditions hold. The first condition is that `e1`

has type `t1`

in
`ctx`

. The second is that `e2`

has type `t2`

in a new context, which is `ctx`

extended to bind `x`

to `t1`

.

**Binary operators.** We’ll need a couple different rules for binary operators.

```
ctx |- e1 bop e2 : int
if bop is + or *
and ctx |- e1 : int
and ctx |- e2 : int
ctx |- e1 <= e2 : bool
if ctx |- e1 : int
and ctx |- e2 : int
```

**If.** Just like OCaml, an if expression must have a Boolean guard, and its two
branches must have the same type.

```
ctx |- if e1 then e2 else e3 : t
if ctx |- e1 : bool
and ctx |- e2 : t
and ctx |- e3 : t
```

## 9.5.2. A Type Checker for SimPL¶

Let’s implement a type checker for SimPL, based on the type system we defined in the previous section. You can download the completed type checker as part of the SimPL interpreter: simpl.zip

We need a variant to represent types:

```
type typ =
| TInt
| TBool
```

The natural name for that variant would of course have been “type” not “typ”,
but the former is already a keyword in OCaml. We have to prefix the constructors
with “T” to disambiguate them from the constructors of the `expr`

type, which
include `Int`

and `Bool`

.

Let’s introduce a small signature for typing contexts, based on the abstractions we’ve introduced so far: the empty context, looking up a variable, and extending a context.

```
module type Context = sig
(** [t] is the type of a context. *)
type t
(** [empty] is the empty context. *)
val empty : t
(** [lookup ctx x] gets the binding of [x] in [ctx].
Raises: [Failure] if [x] is not bound in [ctx]. *)
val lookup : t -> string -> typ
(** [extend ctx x ty] is [ctx] extended with a binding
of [x] to [ty]. *)
val extend : t -> string -> typ -> t
end
```

It’s easy to implement that signature with an association list.

```
module Context : Context = struct
type t = (string * typ) list
let empty = []
let lookup ctx x =
try List.assoc x ctx
with Not_found -> failwith "Unbound variable"
let extend ctx x ty =
(x, ty) :: ctx
end
```

Now we can implement the typing relation `|-`

. We’ll do that by writing a
function `typeof : Context.t -> expr -> typ`

, such that `typeof ctx e = t`

if
and only if `ctx |- e : t`

. Note that the `typeof`

function produces the type as
output, so the function is actually inferring the type! That inference is easy
for SimPL; it would be considerably harder for larger languages.

Let’s start with the base cases:

```
open Context
(** [typeof ctx e] is the type of [e] in context [ctx].
Raises: [Failure] if [e] is not well typed in [ctx]. *)
let rec typeof ctx = function
| Int _ -> TInt
| Bool _ -> TBool
| Var x -> lookup ctx x
...
```

Note how the implementation of `typeof`

so far is based on the rules we
previously defined for `|-`

. In particular:

`typeof`

is a recursive function, just as`|-`

is an inductive relation.The base cases for the recursion of

`typeof`

are the same as the base cases for`|-`

.

Also note how the implementation of `typeof`

differs in one major way from the
definition of `|-`

: error handling. The type system didn’t say what to do about
errors; rather, it just defined what it meant to be well typed. The type
checker, on the other hand, needs to take action and report ill typed programs.
Our `typeof`

function does that by raising exceptions. The `lookup`

function, in
particular, will raise an exception if we attempt to lookup a variable that
hasn’t been bound in the context.

Let’s continue with the recursive cases:

```
...
| Let (x, e1, e2) -> typeof_let ctx x e1 e2
| Binop (bop, e1, e2) -> typeof_bop ctx bop e1 e2
| If (e1, e2, e3) -> typeof_if ctx e1 e2 e3
```

We’re factoring out a helper function for each branch for the sake of keeping
the pattern match readable. Each of the helpers directly encodes the ideas of
the `|-`

rules, with error handling added.

```
and typeof_let ctx x e1 e2 =
let t1 = typeof ctx e1 in
let ctx' = extend ctx x t1 in
typeof ctx' e2
and typeof_bop ctx bop e1 e2 =
let t1, t2 = typeof ctx e1, typeof ctx e2 in
match bop, t1, t2 with
| Add, TInt, TInt
| Mult, TInt, TInt -> TInt
| Leq, TInt, TInt -> TBool
| _ -> failwith "Operator and operand type mismatch"
and typeof_if ctx e1 e2 e3 =
if typeof ctx e1 = TBool
then begin
let t2 = typeof ctx e2 in
if t2 = typeof ctx e3 then t2
else failwith "Branches of if must have same type"
end
else failwith "Guard of if must have type bool"
```

Note how the recursive calls in the implementation of `typeof`

occur exactly in
the same places where the definition of `|-`

is inductive.

Finally, we can implement a function to check whether an expression is well typed:

```
(** [typecheck e] checks whether [e] is well typed in
the empty context. Raises: [Failure] if not. *)
let typecheck e =
ignore (typeof empty e)
```

## 9.5.3. Type Safety¶

What is the purpose of a type system? There might be many, but one of the
primary purposes is to ensure that certain run-time errors don’t occur. Now that
we know how to formalize type systems with the `|-`

relation and evaluation with
the `-->`

relation, we can make that idea precise.

The goals of a language designer usually include ensuring that these two
properties, which establish a relationship between `|-`

and `-->`

, both hold:

**Progress:**If an expression is well typed, then either it is already a value, or it can take at least one step. We can formalize that as, “for all`e`

, if there exists a`t`

such that`{} |- e : t`

, then`e`

is a value, or there exists an`e'`

such that`e --> e'`

.”**Preservation:**If an expression is well typed, then if the expression steps, the new expression has the same type as the old expression. Formally, “for all`e`

and`t`

such that`{} |- e : t`

, if there exists an`e'`

such that`e --> e'`

, then`{} |- e' : t`

.”

Put together, progress plus preservation imply that that evaluation of a
well-typed expression can never *get stuck*, meaning it reaches a non-value that
cannot take a step. This property is known as *type safety*.

For example, `5 + true`

would get stuck using the SimPL evaluation relation,
because the primitive `+`

operation cannot accept a Boolean as an operand. But
the SimPL type system won’t accept that program, thus saving us from ever
reaching that situation.

Looking back at the SimPL we wrote, everywhere in the implementation of `step`

where we raised an exception was a place where evaluation would get stuck. But
the type system guarantees those exceptions will never occur.