6.7. Proving Correctness

Testing provides evidence of correctness, but not full assurance. Even after extensive black-box and glass-box testing, maybe there’s still some test case the programmer failed to invent, and that test case would reveal a fault in the program.

Program testing can be used to show the presence of bugs, but never to show their absence.

—Edsger W. Dijkstra

The point is not that testing is useless! It can be quite effective. But it is a kind of inductive reasoning, in which evidence (i.e., passing tests) accumulates in support of a conclusion (i.e., correctness of the program) without absolutely guaranteeing the validity of that conclusion. (Note that the word “inductive” here is being used in a different sense than the proof technique known as induction.) To get that guarantee, we turn to deductive reasoning, in which we proceed from premises and rules about logic to a valid conclusion. In other words, we prove the correctness of the program. Our goal, next, is to learn some techniques for such correctness proofs. These techniques are known as formal methods because of their use of logical formalism.

Correctness here means that the program produces the right output according to a specification. Specifications are usually provided in the documentation of a function (hence the name “specification comment”): they describe the program’s precondition and postcondition. Postconditions, as we have been writing them, have the form [f x] is "...a description of the output in terms of the input [x]...". For example, the specification of a factorial function could be:

(** [fact n] is [n!]. Requires: [n >= 0]. *)
let rec fact n = ...

The postcondition is asserting an equality between the output of the function and some English description of a computation on the input. Formal verification is the task for proving that the implementation of the function satisfies its specification.

Equalities are one of the fundamental ways we think about correctness of functional programs. The absence of mutable state makes it possible to reason straightforwardly about whether two expressions are equal. It’s difficult to do that in an imperative language, because those expressions might have side effects that change the state.

6.7.1. Equality

When are two expressions equal? Two possible answers are:

  • When they are syntactically identical.

  • When they are semantically equivalent: they produce the same value.

For example, are 42 and 41+1 equal? The syntactic answer would say they are not, because they involve different tokens. The semantic answer would say they are: they both produce the value 42.

What about functions: are fun x -> x and fun y -> y equal? Syntactically they are different. But semantically, they both produce a value that is the identity function: when they are applied to an input, they will both produce the same output. That is, (fun x -> x) z = z, and (fun y -> y) z =  z. If it is the case that for all inputs two functions produce the same output, we will consider the functions to be equal:

if (forall x, f x = g x), then f = g.

That definition of equality for functions is known as the Axiom of Extensionality in some branches of mathematics; henceforth we’ll refer to it simply as “extensionality”.

Here we will adopt the semantic approach. If e1 and e2 evaluate to the same value v, then we write e1 = e2. We are using = here in a mathematical sense of equality, not as the OCaml polymorphic equality operator. For example, we allow (fun x -> x) = (fun y -> y), even though OCaml’s operator would raise an exception and refuse to compare functions.

We’re also going to restrict ourselves to expressions that are well typed, pure (meaning they have no side effects), and total (meaning they don’t have exceptions or infinite loops).

6.7.2. Equational Reasoning

Consider these functions:

let twice f x = f (f x)
let compose f g x = f (g x)
val twice : ('a -> 'a) -> 'a -> 'a = <fun>
val compose : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b = <fun>

We know from the rules of OCaml evaluation that twice h x = h (h x), and likewise, compose h h x = h (h x). Thus we have:

twice h x = h (h x) = compose h h x

Therefore can conclude that twice h x = compose h h x. And by extensionality we can simplify that equality: Since twice h x = compose h h x holds for all x, we can conclude twice h = compose h h.

As another example, suppose we define an infix operator for function composition:

let ( << ) = compose

Then we can prove that composition is associative, using equational reasoning:

Theorem: (f << g) << h  =  f << (g << h)

Proof: By extensionality, we need to show
  ((f << g) << h) x  =  (f << (g << h)) x
for an arbitrary x.

  ((f << g) << h) x
= (f << g) (h x)
= f (g (h x))


  (f << (g << h)) x
= f ((g << h) x)
= f (g (h x))

So ((f << g) << h) x = f (g (h x)) = (f << (g << h)) x.


All of the steps in the equational proof above follow from evaluation. Another format for writing the proof would provide hints as to why each step is valid:

  ((f << g) << h) x
=   { evaluation of << }
  (f << g) (h x)
=   { evaluation of << }
  f (g (h x))


  (f << (g << h)) x
=   { evaluation of << }
  f ((g << h) x)
=   { evaluation of << }
  f (g (h x))

6.7.3. Induction on Natural Numbers

The following function sums the non-negative integers up to n:

let rec sumto n =
  if n = 0 then 0 else n + sumto (n - 1)
val sumto : int -> int = <fun>

You might recall that the same summation can be expressed in closed form as n * (n + 1) / 2. To prove that forall n >= 0, sumto n = n * (n + 1) / 2, we will need mathematical induction.

Recall that induction on the natural numbers (i.e., the non-negative integers) is formulated as follows:

forall properties P,
  if P(0),
  and if forall k, P(k) implies P(k + 1),
  then forall n, P(n)

That is called the induction principle for natural numbers. The base case is to prove P(0), and the inductive case is to prove that P(k + 1) holds under the assumption of the inductive hypothesis P(k).

Let’s use induction to prove the correctness of sumto.

Claim: sumto n = n * (n + 1) / 2

Proof: by induction on n.
P(n) = sumto n = n * (n + 1) / 2

Base case: n = 0
Show: sumto 0 = 0 * (n + 1) / 2

  sumto 0
=   { evaluation }
=   { algebra }
  0 * (n + 1) / 2

Inductive case: n = k + 1
Show: sumto (k + 1) = (k + 1) * ((k + 1) + 1) / 2
IH: sumto k = k * (k + 1) / 2

  sumto (k + 1)
=   { evaluation }
  k + 1 + sumto k
=   { IH }
  k + 1 + k * (k + 1) / 2
=   { algebra }
  (k + 1) * (k + 2) / 2


Note that we have been careful in each of the cases to write out what we need to show, as well as to write down the inductive hypothesis. It is important to show all this work.

Suppose we now define:

let sumto_closed n = n * (n + 1) / 2
val sumto_closed : int -> int = <fun>

Then as a corollary to our previous claim, by extensionality we can conclude

sumto_closed = sumto

Technically that equality holds only inputs that are natural numbers. But since all our examples henceforth will be for naturals, not integers per se, we will elide stating any preconditions or restrictions regarding natural numbers.

6.7.4. Programs as Specifications

We have just proved the correctness of an efficient implementation relative to an inefficient implementation. The inefficient implementation, sumto, serves as a specification for the efficient implementation, sumto_closed.

That technique is common in verifying functional programs: write an obviously correct implementation that is lacking in some desired property, such as efficiency, then prove that a better implementation is equal to the original.

Let’s do another example of this kind of verification. This time, well use the factorial function.

The simple, obviously correct implementation of factorial would be:

let rec fact n =
  if n = 0 then 1 else n * fact (n - 1)
val fact : int -> int = <fun>

A tail-recursive implementation would be more efficient about stack space:

let rec facti acc n =
  if n = 0 then acc else facti (acc * n) (n - 1)

let fact_tr n = facti 1 n
val facti : int -> int -> int = <fun>
val fact_tr : int -> int = <fun>

The i in the name facti stands for iterative. We call this an iterative implementation because it strongly resembles how the same computation would be expressed using a loop (that is, an iteration construct) in an imperative language. For example, in Java we might write:

int facti (int n) {
  int acc = 1;
  while (n != 0) {
    acc *= n;
  return acc;

Both the OCaml and Java implementation of facti share these features:

  • they start acc at 1

  • they check whether n is 0

  • they multiply acc by n

  • they decrement n

  • they return the accumulator, acc

Let’s try to prove that fact_tr correctly implements the same computation as fact.

Claim: forall n, fact n = fact_tr n

Since fact_tr n = facti 1 n, it suffices to show fact n = facti 1 n.

Proof: by induction on n.
P(n) = fact n = facti 1 n

Base case: n = 0
Show: fact 0 = facti 1 0

  fact 0
=   { evaluation }
=   { evaluation }
  facti 1 0

Inductive case: n = k + 1
Show: fact (k + 1) = facti 1 (k + 1)
IH: fact k = facti 1 k

  fact (k + 1)
=   { evaluation }
  (k + 1) * fact k
=   { IH }
  (k + 1) * facti 1 k

  facti 1 (k + 1)
=   { evaluation }
  facti (1 * (k + 1)) k
=   { evaluation }
  facti (k + 1) k

Unfortunately, we're stuck.  Neither side of what we want to show
can be manipulated any further.


We know that facti (k + 1) k and (k + 1) * facti 1 k should yield the same value. But the IH allows us only to use 1 as the second argument to facti, instead of a bigger argument like k + 1. So our proof went astray the moment we used the IH. We need a stronger inductive hypothesis!

So let’s strengthen the claim we are making. Instead of showing that fact n = facti 1 n, we’ll try to show forall p, p * fact n = facti p n. That generalizes the k + 1 we were stuck on to an arbitrary quantity p.

Claim: forall n, forall p . p * fact n = facti p n

Proof: by induction on n.
P(n) = forall p, p * fact n = facti p n

Base case:  n = 0
Show: forall p,  p * fact 0 = facti p 0

  p * fact 0
=   { evaluation and algebra }
=   { evaluation }
  facti p 0

Inductive case: n = k + 1
Show: forall p,  p * fact (k + 1) = facti p (k + 1)
IH: forall p,  p * fact k = facti p k

  p * fact (k + 1)
=   { evaluation }
  p * (k + 1) * fact k
=   { IH, instantiating its p as p * (k + 1) }
  facti (p * (k + 1)) k

  facti p (k + 1)
=   { evaluation }
  facti (p * (k + 1)) k


Claim: forall n, fact n = fact_tr n


  fact n
=   { algebra }
  1 * fact n
=   { previous claim }
  facti 1 n
=   { evaluation }
  fact_tr n


That finishes our proof that the efficient, tail-recursive function fact_tr is equivalent to the simple, recursive function fact. In essence, we have proved the correctness of fact_tr using fact as its specification.

6.7.5. Recursion vs. Iteration

We added an accumulator as an extra argument to make the factorial function be tail recursive. That’s a trick we’ve seen before. Let’s abstract and see how to do it in general.

Suppose we have a recursive function over integers:

let rec f_r n =
  if n = 0 then i else op n (f_r (n - 1))

Here, the r in f_r is meant to suggest that f_r is a recursive function. The i and op are pieces of the function that are meant to be replaced by some concrete value i and operator op. For example, with the factorial function, we have:

f_r = fact
i = 1
op = ( * )

Such a function can be made tail recursive by rewriting it as follows:

let f_i acc n =
  if n = 0 then acc
  else f_i (op acc n) (n - 1)

let f_tr = f_i i

Here, the i in f_i is meant to suggest that f_i is an iterative function, and i and op are the same as in the recursive version of the function. For example, with factorial we have:

f_i = fact_i
i = 1
op = ( * )
f_tr = fact_tr

We can prove that f_r and f_tr compute the same function. During the proof, next, we will discover certain conditions that must hold of i and op to make the transformation to tail recursion be correct.

Theorem: f_r = f_tr

Proof:  By extensionality, it suffices to show that forall n, f_r n = f_tr n.

As in the previous proof for fact, we will need a strengthened induction
hypothesis. So we first prove this lemma, which quantifies over all accumulators
that could be input to f_i, rather than only i:

  Lemma: forall n, forall acc, op acc (f_r n) = f_i acc n

  Proof of Lemma: by induction on n.
  P(n) = forall acc, op acc (f_r n) = f_i acc n

  Base: n = 0
  Show: forall acc, op acc (f_r 0) = f_i acc 0

    op acc (f_r 0)
  =   { evaluation }
    op acc i
  =   { if we assume forall x, op x i = x }

    f_i acc 0

  Inductive case: n = k + 1
  Show: forall acc, op acc (f_r (k + 1)) = f_i acc (k + 1)
  IH: forall acc, op acc (f_r k) = f_i acc k

    op acc (f_r (k + 1))
  =   { evaluation }
    op acc (op (k + 1) (f_r k))
  =   { if we assume forall x y z, op x (op y z) = op (op x y) z }
    op (op acc (k + 1)) (f_r k)

    f_i acc (k + 1)
  =   { evaluation }
    f_i (op acc (k + 1)) k
  =   { IH, instantiating acc as op acc (k + 1)}
    op (op acc (k + 1)) (f_r k)


The proof then follows almost immediately from the lemma:

  f_r n
=   { if we assume forall x, op i x = x }
  op i (f_r n)
=   { lemma, instantiating acc as i }
  f_i i n
=   { evaluation }
  f_tr n


Along the way we made three assumptions about i and op:

  1. forall x, op x i = x

  2. op x (op y z) = op (op x y) z

  3. forall x, op i x = x

The first and third say that i is an identity of op: using it on the left or right side leaves the other argument x unchanged. The second says that op is associative. Both those assumptions held for the values we used in the factorial functions:

  • op is multiplication, which is associative.

  • i is 1, which is an identity of multiplication: multiplication by 1 leaves the other argument unchanged.

So our transformation from a recursive to a tail-recursive function is valid as long as the operator applied in the recursive call is associative, and the value returned in the base case is an identity of that operator.

Returning to the sumto function, we can apply the theorem we just proved to immediately get a tail-recursive version:

let rec sumto_r n =
  if n = 0 then 0 else n + sumto_r (n - 1)
val sumto_r : int -> int = <fun>

Here, the operator is addition, which is associative; and the base case is zero, which is an identity of addition. Therefore our theorem applies, and we can use it to produce the tail-recursive version without even having to think about it:

let rec sumto_i acc n =
  if n = 0 then acc else sumto_i (acc + n) (n - 1)

let sumto_tr = sumto_i 0
val sumto_i : int -> int -> int = <fun>
val sumto_tr : int -> int = <fun>

We already know that sumto_tr is correct, thanks to our theorem.

6.7.6. Termination

Sometimes correctness of programs is further divided into:

  • partial correctness: meaning that if a program terminates, then its output is correct; and

  • total correctness: meaning that a program does terminate, and its output is correct.

Total correctness is therefore the conjunction of partial correctness and termination. Thus far, we have been proving partial correctness.

To prove that a program terminates is difficult. Indeed, it is impossible in general for an algorithm to do so: a computer can’t precisely decide whether a program will terminate. (Look up the “halting problem” for more details.) But, a smart human sometimes can do so.

There is a simple heuristic that can be used to show that a recursive function terminates:

  • All recursive calls are on a “smaller” input, and

  • all base cases are terminating.

For example, consider the factorial function:

let rec fact n =
  if n = 0 then 1
  else n * fact (n - 1)
val fact : int -> int = <fun>

The base case, 1, obviously terminates. The recursive call is on n - 1, which is a smaller input than the original n. So fact always terminates (as long as its input is a natural number).

The same reasoning applies to all the other functions we’ve discussed above.

To make this more precise, we need a notion of what it means to be smaller. Suppose we have a binary relation < on inputs. Despite the notation, this relation need not be the less-than relation on integers—although that will work for fact. Also suppose that it is never possible to create an infinite sequence x0 > x1 > x2 > x3 ... of elements using this relation. (Where of course a > b iff b < a.) That is, there are no infinite descending chains of elements: once you pick a starting element x0, there can be only a finite number of “descents” according to the < relation before you bottom out and hit a base case. This property of < makes it a well-founded relation.

So, a recursive function terminates if all its recursive calls are on elements that are smaller according to <. Why? Because there can be only a finite number of calls before a base case is reached, and base cases must terminate.

The usual < relation is well-founded on the natural numbers, because eventually any chain must reach the base case of 0. But it is not well-founded on the integers, which can get just keep getting smaller: -1 > -2 > -3 > ....

Here’s an interesting function for which the usual < relation doesn’t suffice to prove termination:

let rec ack = function
  | (0, n) -> n + 1
  | (m, 0) -> ack (m - 1, 1)
  | (m, n) -> ack (m - 1, ack (m, n - 1))
val ack : int * int -> int = <fun>

This is known as Ackermann’s function. It grows faster than any exponential function. Try running ack (1, 1), ack (2, 1), ack (3, 1), then ack (4, 1) to get a sense of that. It also is a famous example of a function that can be implemented with while loops but not with for loops. Nonetheless, it does terminate.

To show that, the base case is easy: when the input is (0, _), the function terminates. But in other cases, it makes a recursive call, and we need to define an appropriate < relation. It turns out lexicograpic ordering on pairs works. Define (a, b) < (c, d) if:

  • a < c, or

  • a = c and b < d.

The < order in those two cases is the usual < on natural numbers.

In the first recursive call, (m - 1, 1) < (m, 0) by the first case of the definition of <, because m - 1 < m. In the nested recursive call ack (m - 1, ack (m, n - 1)), both cases are needed:

  • (m, n - 1) < (m, n) because m = m and n - 1 < n

  • (m - 1, _) < (m, n) because m - 1 < m.