# Proving Correctness

## Contents

# 6.7. Proving Correctness#

Testing provides evidence of correctness, but not full assurance. Even after extensive black-box and glass-box testing, maybe there’s still some test case the programmer failed to invent, and that test case would reveal a fault in the program.

Program testing can be used to show the presence of bugs, but never to show their absence.

—Edsger W. Dijkstra

The point is not that testing is useless! It can be quite effective. But it is a
kind of *inductive reasoning*, in which evidence (i.e., passing tests)
accumulates in support of a conclusion (i.e., correctness of the program)
without absolutely guaranteeing the validity of that conclusion. (Note that the
word “inductive” here is being used in a different sense than the proof
technique known as induction.) To get that guarantee, we turn to *deductive
reasoning*, in which we proceed from premises and rules about logic to a valid
conclusion. In other words, we prove the correctness of the program. Our goal,
next, is to learn some techniques for such correctness proofs. These techniques
are known as *formal methods* because of their use of logical formalism.

*Correctness* here means that the program produces the right output
according to a *specification*. Specifications are usually provided in the
documentation of a function (hence the name “specification comment”): they
describe the program’s precondition and postcondition. Postconditions, as we
have been writing them, have the form `[f x] is "...a description of the output in terms of the input [x]..."`

. For example, the specification of a factorial
function could be:

```
(** [fact n] is [n!]. Requires: [n >= 0]. *)
let rec fact n = ...
```

The postcondition is asserting an equality between the output of the function
and some English description of a computation on the input. *Formal
verification* is the task for proving that the implementation of the function
satisfies its specification.

Equalities are one of the fundamental ways we think about correctness of functional programs. The absence of mutable state makes it possible to reason straightforwardly about whether two expressions are equal. It’s difficult to do that in an imperative language, because those expressions might have side effects that change the state.

## 6.7.1. Equality#

When are two expressions equal? Two possible answers are:

When they are syntactically identical.

When they are semantically equivalent: they produce the same value.

For example, are `42`

and `41+1`

equal? The syntactic answer would say they are
not, because they involve different tokens. The semantic answer would say they
are: they both produce the value `42`

.

What about functions: are `fun x -> x`

and `fun y -> y`

equal? Syntactically
they are different. But semantically, they both produce a value that is the
identity function: when they are applied to an input, they will both produce the
same output. That is, `(fun x -> x) z = z`

, and `(fun y -> y) z = z`

. If it is
the case that for all inputs two functions produce the same output, we will
consider the functions to be equal:

```
if (forall x, f x = g x), then f = g.
```

That definition of equality for functions is known as the *Axiom of
Extensionality* in some branches of mathematics; henceforth we’ll refer to it
simply as “extensionality”.

Here we will adopt the semantic approach. If `e1`

and `e2`

evaluate to the same
value `v`

, then we write `e1 = e2`

. We are using `=`

here in a mathematical
sense of equality, not as the OCaml polymorphic equality operator. For example,
we allow `(fun x -> x) = (fun y -> y)`

, even though OCaml’s operator would raise
an exception and refuse to compare functions.

We’re also going to restrict ourselves to expressions that are well typed, pure (meaning they have no side effects), and total (meaning they don’t have exceptions or infinite loops).

## 6.7.2. Equational Reasoning#

Consider these functions:

```
let twice f x = f (f x)
let compose f g x = f (g x)
```

```
val twice : ('a -> 'a) -> 'a -> 'a = <fun>
```

```
val compose : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b = <fun>
```

We know from the rules of OCaml evaluation that `twice h x = h (h x)`

, and
likewise, `compose h h x = h (h x)`

. Thus we have:

```
twice h x = h (h x) = compose h h x
```

Therefore can conclude that `twice h x = compose h h x`

. And by extensionality
we can simplify that equality: Since `twice h x = compose h h x`

holds for all
`x`

, we can conclude `twice h = compose h h`

.

As another example, suppose we define an infix operator for function composition:

```
let ( << ) = compose
```

Then we can prove that composition is associative, using equational reasoning:

```
Theorem: (f << g) << h = f << (g << h)
Proof: By extensionality, we need to show
((f << g) << h) x = (f << (g << h)) x
for an arbitrary x.
((f << g) << h) x
= (f << g) (h x)
= f (g (h x))
and
(f << (g << h)) x
= f ((g << h) x)
= f (g (h x))
So ((f << g) << h) x = f (g (h x)) = (f << (g << h)) x.
QED
```

All of the steps in the equational proof above follow from evaluation. Another format for writing the proof would provide hints as to why each step is valid:

```
((f << g) << h) x
= { evaluation of << }
(f << g) (h x)
= { evaluation of << }
f (g (h x))
and
(f << (g << h)) x
= { evaluation of << }
f ((g << h) x)
= { evaluation of << }
f (g (h x))
```

## 6.7.3. Induction on Natural Numbers#

The following function sums the non-negative integers up to `n`

:

```
let rec sumto n =
if n = 0 then 0 else n + sumto (n - 1)
```

```
val sumto : int -> int = <fun>
```

You might recall that the same summation can be expressed in closed form as
`n * (n + 1) / 2`

. To prove that `forall n >= 0, sumto n = n * (n + 1) / 2`

, we
will need *mathematical induction*.

Recall that induction on the natural numbers (i.e., the non-negative integers) is formulated as follows:

```
forall properties P,
if P(0),
and if forall k, P(k) implies P(k + 1),
then forall n, P(n)
```

That is called the *induction principle* for natural numbers. The *base case* is
to prove `P(0)`

, and the *inductive case* is to prove that `P(k + 1)`

holds
under the assumption of the *inductive hypothesis* `P(k)`

.

Let’s use induction to prove the correctness of `sumto`

.

```
Claim: sumto n = n * (n + 1) / 2
Proof: by induction on n.
P(n) = sumto n = n * (n + 1) / 2
Base case: n = 0
Show: sumto 0 = 0 * (0 + 1) / 2
sumto 0
= { evaluation }
0
= { algebra }
0 * (0 + 1) / 2
Inductive case: n = k + 1
Show: sumto (k + 1) = (k + 1) * ((k + 1) + 1) / 2
IH: sumto k = k * (k + 1) / 2
sumto (k + 1)
= { evaluation }
k + 1 + sumto k
= { IH }
k + 1 + k * (k + 1) / 2
= { algebra }
(k + 1) * (k + 2) / 2
QED
```

Note that we have been careful in each of the cases to write out what we need to show, as well as to write down the inductive hypothesis. It is important to show all this work.

Suppose we now define:

```
let sumto_closed n = n * (n + 1) / 2
```

```
val sumto_closed : int -> int = <fun>
```

Then as a corollary to our previous claim, by extensionality we can conclude

```
sumto_closed = sumto
```

Technically that equality holds only inputs that are natural numbers. But since all our examples henceforth will be for naturals, not integers per se, we will elide stating any preconditions or restrictions regarding natural numbers.

## 6.7.4. Programs as Specifications#

We have just proved the correctness of an efficient implementation relative to
an inefficient implementation. The inefficient implementation, `sumto`

, serves
as a specification for the efficient implementation, `sumto_closed`

.

That technique is common in verifying functional programs: write an obviously correct implementation that is lacking in some desired property, such as efficiency, then prove that a better implementation is equal to the original.

Let’s do another example of this kind of verification. This time, well use the factorial function.

The simple, obviously correct implementation of factorial would be:

```
let rec fact n =
if n = 0 then 1 else n * fact (n - 1)
```

```
val fact : int -> int = <fun>
```

A tail-recursive implementation would be more efficient about stack space:

```
let rec facti acc n =
if n = 0 then acc else facti (acc * n) (n - 1)
let fact_tr n = facti 1 n
```

```
val facti : int -> int -> int = <fun>
```

```
val fact_tr : int -> int = <fun>
```

The `i`

in the name `facti`

stands for *iterative*. We call this an iterative
implementation because it strongly resembles how the same computation would be
expressed using a loop (that is, an iteration construct) in an imperative
language. For example, in Java we might write:

```
int facti (int n) {
int acc = 1;
while (n != 0) {
acc *= n;
n--;
}
return acc;
}
```

Both the OCaml and Java implementation of `facti`

share these features:

they start

`acc`

at`1`

they check whether

`n`

is`0`

they multiply

`acc`

by`n`

they decrement

`n`

they return the accumulator,

`acc`

Let’s try to prove that `fact_tr`

correctly implements the same computation as
`fact`

.

```
Claim: forall n, fact n = fact_tr n
Since fact_tr n = facti 1 n, it suffices to show fact n = facti 1 n.
Proof: by induction on n.
P(n) = fact n = facti 1 n
Base case: n = 0
Show: fact 0 = facti 1 0
fact 0
= { evaluation }
1
= { evaluation }
facti 1 0
Inductive case: n = k + 1
Show: fact (k + 1) = facti 1 (k + 1)
IH: fact k = facti 1 k
fact (k + 1)
= { evaluation }
(k + 1) * fact k
= { IH }
(k + 1) * facti 1 k
facti 1 (k + 1)
= { evaluation }
facti (1 * (k + 1)) k
= { evaluation }
facti (k + 1) k
Unfortunately, we're stuck. Neither side of what we want to show
can be manipulated any further.
ABORT
```

We know that `facti (k + 1) k`

and `(k + 1) * facti 1 k`

should yield the same
value. But the IH allows us only to use `1`

as the second argument to `facti`

,
instead of a bigger argument like `k + 1`

. So our proof went astray the moment
we used the IH. We need a stronger inductive hypothesis!

So let’s strengthen the claim we are making. Instead of showing that
`fact n = facti 1 n`

, we’ll try to show `forall p, p * fact n = facti p n`

. That
generalizes the `k + 1`

we were stuck on to an arbitrary quantity `p`

.

```
Claim: forall n, forall p . p * fact n = facti p n
Proof: by induction on n.
P(n) = forall p, p * fact n = facti p n
Base case: n = 0
Show: forall p, p * fact 0 = facti p 0
p * fact 0
= { evaluation and algebra }
p
= { evaluation }
facti p 0
Inductive case: n = k + 1
Show: forall p, p * fact (k + 1) = facti p (k + 1)
IH: forall p, p * fact k = facti p k
p * fact (k + 1)
= { evaluation }
p * (k + 1) * fact k
= { IH, instantiating its p as p * (k + 1) }
facti (p * (k + 1)) k
facti p (k + 1)
= { evaluation }
facti (p * (k + 1)) k
QED
Claim: forall n, fact n = fact_tr n
Proof:
fact n
= { algebra }
1 * fact n
= { previous claim }
facti 1 n
= { evaluation }
fact_tr n
QED
```

That finishes our proof that the efficient, tail-recursive function `fact_tr`

is
equivalent to the simple, recursive function `fact`

. In essence, we have proved
the correctness of `fact_tr`

using `fact`

as its specification.

## 6.7.5. Recursion vs. Iteration#

We added an accumulator as an extra argument to make the factorial function be tail recursive. That’s a trick we’ve seen before. Let’s abstract and see how to do it in general.

Suppose we have a recursive function over integers:

```
let rec f_r n =
if n = 0 then i else op n (f_r (n - 1))
```

Here, the `r`

in `f_r`

is meant to suggest that `f_r`

is a recursive function.
The `i`

and `op`

are pieces of the function that are meant to be replaced by
some concrete value `i`

and operator `op`

. For example, with the factorial
function, we have:

```
f_r = fact
i = 1
op = ( * )
```

Such a function can be made tail recursive by rewriting it as follows:

```
let rec f_i acc n =
if n = 0 then acc
else f_i (op acc n) (n - 1)
let f_tr = f_i i
```

Here, the `i`

in `f_i`

is meant to suggest that `f_i`

is an iterative function,
and `i`

and `op`

are the same as in the recursive version of the function. For
example, with factorial we have:

```
f_i = fact_i
i = 1
op = ( * )
f_tr = fact_tr
```

We can prove that `f_r`

and `f_tr`

compute the same function. During the proof,
next, we will discover certain conditions that must hold of `i`

and `op`

to make
the transformation to tail recursion be correct.

```
Theorem: f_r = f_tr
Proof: By extensionality, it suffices to show that forall n, f_r n = f_tr n.
As in the previous proof for fact, we will need a strengthened induction
hypothesis. So we first prove this lemma, which quantifies over all accumulators
that could be input to f_i, rather than only i:
Lemma: forall n, forall acc, op acc (f_r n) = f_i acc n
Proof of Lemma: by induction on n.
P(n) = forall acc, op acc (f_r n) = f_i acc n
Base: n = 0
Show: forall acc, op acc (f_r 0) = f_i acc 0
op acc (f_r 0)
= { evaluation }
op acc i
= { if we assume forall x, op x i = x }
acc
f_i acc 0
= { evaluation }
acc
Inductive case: n = k + 1
Show: forall acc, op acc (f_r (k + 1)) = f_i acc (k + 1)
IH: forall acc, op acc (f_r k) = f_i acc k
op acc (f_r (k + 1))
= { evaluation }
op acc (op (k + 1) (f_r k))
= { if we assume forall x y z, op x (op y z) = op (op x y) z }
op (op acc (k + 1)) (f_r k)
f_i acc (k + 1)
= { evaluation }
f_i (op acc (k + 1)) k
= { IH, instantiating acc as op acc (k + 1)}
op (op acc (k + 1)) (f_r k)
QED
The proof then follows almost immediately from the lemma:
f_r n
= { if we assume forall x, op i x = x }
op i (f_r n)
= { lemma, instantiating acc as i }
f_i i n
= { evaluation }
f_tr n
QED
```

Along the way we made three assumptions about i and op:

`forall x, op x i = x`

`op x (op y z) = op (op x y) z`

`forall x, op i x = x`

The first and third say that `i`

is an *identity* of `op`

: using it on the left
or right side leaves the other argument `x`

unchanged. The second says that `op`

is *associative*. Both those assumptions held for the values we used in the
factorial functions:

`op`

is multiplication, which is associative.`i`

is`1`

, which is an identity of multiplication: multiplication by 1 leaves the other argument unchanged.

So our transformation from a recursive to a tail-recursive function is valid as long as the operator applied in the recursive call is associative, and the value returned in the base case is an identity of that operator.

Returning to the `sumto`

function, we can apply the theorem we just proved to
immediately get a tail-recursive version:

```
let rec sumto_r n =
if n = 0 then 0 else n + sumto_r (n - 1)
```

```
val sumto_r : int -> int = <fun>
```

Here, the operator is addition, which is associative; and the base case is zero, which is an identity of addition. Therefore our theorem applies, and we can use it to produce the tail-recursive version without even having to think about it:

```
let rec sumto_i acc n =
if n = 0 then acc else sumto_i (acc + n) (n - 1)
let sumto_tr = sumto_i 0
```

```
val sumto_i : int -> int -> int = <fun>
```

```
val sumto_tr : int -> int = <fun>
```

We already know that `sumto_tr`

is correct, thanks to our theorem.

## 6.7.6. Termination#

Sometimes correctness of programs is further divided into:

**partial correctness**: meaning that*if*a program terminates, then its output is correct; and**total correctness**: meaning that a program*does*terminate,*and*its output is correct.

Total correctness is therefore the conjunction of partial correctness and termination. Thus far, we have been proving partial correctness.

To prove that a program terminates is difficult. Indeed, it is impossible in general for an algorithm to do so: a computer can’t precisely decide whether a program will terminate. (Look up the “halting problem” for more details.) But, a smart human sometimes can do so.

There is a simple heuristic that can be used to show that a recursive function terminates:

All recursive calls are on a “smaller” input, and

all base cases are terminating.

For example, consider the factorial function:

```
let rec fact n =
if n = 0 then 1
else n * fact (n - 1)
```

```
val fact : int -> int = <fun>
```

The base case, `1`

, obviously terminates. The recursive call is on `n - 1`

,
which is a smaller input than the original `n`

. So `fact`

always terminates (as
long as its input is a natural number).

The same reasoning applies to all the other functions we’ve discussed above.

To make this more precise, we need a notion of what it means to be smaller.
Suppose we have a binary relation `<`

on inputs. Despite the notation, this
relation need not be the less-than relation on integers—although that will
work for `fact`

. Also suppose that it is never possible to create an infinite
sequence `x0 > x1 > x2 > x3 ...`

of elements using this relation. (Where of
course `a > b`

iff `b < a`

.) That is, there are no infinite descending chains of
elements: once you pick a starting element `x0`

, there can be only a finite
number of “descents” according to the `<`

relation before you bottom out and hit
a base case. This property of `<`

makes it a *well-founded relation*.

So, a recursive function terminates if all its recursive calls are on elements
that are smaller according to `<`

. Why? Because there can be only a finite
number of calls before a base case is reached, and base cases must terminate.

The usual `<`

relation is well-founded on the natural numbers, because
eventually any chain must reach the base case of 0. But it is not well-founded
on the integers, which can get just keep getting smaller: `-1 > -2 > -3 > ...`

.

Here’s an interesting function for which the usual `<`

relation doesn’t suffice
to prove termination:

```
let rec ack = function
| (0, n) -> n + 1
| (m, 0) -> ack (m - 1, 1)
| (m, n) -> ack (m - 1, ack (m, n - 1))
```

```
val ack : int * int -> int = <fun>
```

This is known as *Ackermann’s function*. It grows faster than any exponential
function. Try running `ack (1, 1)`

, `ack (2, 1)`

, `ack (3, 1)`

, then `ack (4, 1)`

to get a sense of that. It also is a famous example of a function that can
be implemented with `while`

loops but not with `for`

loops. Nonetheless, it does
terminate.

To show that, the base case is easy: when the input is `(0, _)`

, the function
terminates. But in other cases, it makes a recursive call, and we need to define
an appropriate `<`

relation. It turns out *lexicographic ordering* on pairs
works. Define `(a, b) < (c, d)`

if:

`a < c`

, or`a = c`

and`b < d`

.

The `<`

order in those two cases is the usual `<`

on natural numbers.

In the first recursive call, `(m - 1, 1) < (m, 0)`

by the first case of the
definition of `<`

, because `m - 1 < m`

. In the nested recursive call
`ack (m - 1, ack (m, n - 1))`

, both cases are needed:

`(m, n - 1) < (m, n)`

because`m = m`

and`n - 1 < n`

`(m - 1, _) < (m, n)`

because`m - 1 < m`

.