3.3. Unit Testing with OUnit#
Note
This section is a bit of a detour from our study of data types, but it’s a good place to take the detour: we now know just enough to understand how unit testing can be done in OCaml, and there’s no good reason to wait any longer to learn about it.
Using the toplevel to test functions will only work for very small programs. Larger programs need test suites that contain many unit tests and can be re-run every time we update our code base. A unit test is a test of one small piece of functionality in a program, such as an individual function.
We’ve now learned enough features of OCaml to see how to do unit testing with a library called OUnit. It is a unit testing framework similar to JUnit in Java, HUnit in Haskell, etc. The basic workflow for using OUnit is as follows:
Write a function in a file
f.ml
. There could be many other functions in that file too.Write unit tests for that function in a separate file
test.ml
. That exact name is not actually essential.Build and run
test
to execute the unit tests.
The OUnit documentation is available on GitHub.
3.3.1. An Example of OUnit#
The following example shows you how to create an OUnit test suite. There are some things in the example that might at first seem mysterious; they are discussed in the next section.
Create a new directory. In that directory, create a file named sum.ml
, and put
the following code into it:
let rec sum = function
| [] -> 0
| x :: xs -> x + sum xs
Now create a second file named test.ml
, and put this code into it:
open OUnit2
open Sum
let tests = "test suite for sum" >::: [
"empty" >:: (fun _ -> assert_equal 0 (sum []));
"singleton" >:: (fun _ -> assert_equal 1 (sum [1]));
"two_elements" >:: (fun _ -> assert_equal 3 (sum [1; 2]));
]
let _ = run_test_tt_main tests
Depending on your editor and its configuration, you probably now see some
“Unbound module” errors about OUnit2 and Sum. Don’t worry; the code is actually
correct. We just need to set up dune and tell it to link OUnit. Create a dune
file and put this in it:
(executable
(name test)
(libraries ounit2))
And create a dune-project
file as usual:
(lang dune 3.4)
Now build the test suite:
$ dune build test.exe
Go back to your editor and do anything that will cause it to revisit test.ml
.
You can close and re-open the window, or make a trivial change in the file
(e.g., add then delete a space). Now the errors should all disappear.
Finally, you can run the test suite:
$ dune exec ./test.exe
You will get a response something like this:
...
Ran: 3 tests in: 0.12 seconds.
OK
Now suppose we modify sum.ml
to introduce a bug by changing the code
in it to the following:
let rec sum = function
| [] -> 1 (* bug *)
| x :: xs -> x + sum xs
If rebuild and re-execute the test suite, all test cases now fail. The output
tells us the names of the failing cases. Here’s the beginning of the output, in
which we’ve replaced some strings that will be dependent on your own local
computer with ...
:
FFF
==============================================================================
Error: test suite for sum:2:two_elements.
File ".../_build/oUnit-test suite for sum-...#01.log", line 9, characters 1-1:
Error: test suite for sum:2:two_elements (in the log).
Raised at OUnitAssert.assert_failure in file "src/lib/ounit2/advanced/oUnitAssert.ml", line 45, characters 2-27
Called from OUnitRunner.run_one_test.(fun) in file "src/lib/ounit2/advanced/oUnitRunner.ml", line 83, characters 13-26
not equal
------------------------------------------------------------------------------
The first line of that output
FFF
tells us that OUnit ran three test cases and all three failed.
The next interesting line
Error: test suite for sum:2:two_elements.
tells us that in the test suite named test suite for sum
the test case at
index 2 named two_elements
failed. The rest of the output for that test case
is not particularly interesting; let’s ignore it for now.
3.3.2. Explanation of the OUnit Example#
Let’s study more carefully what we just did in the previous section. In the test
file, open OUnit2
brings into scope the many definitions in OUnit2, which is
version 2 of the OUnit framework. And open Sum
brings into scope the
definitions from sum.ml
. We’ll learn more about scope and the open
keyword
later in a later chapter.
Then we created a list of test cases:
[
"empty" >:: (fun _ -> assert_equal 0 (sum []));
"one" >:: (fun _ -> assert_equal 1 (sum [1]));
"onetwo" >:: (fun _ -> assert_equal 3 (sum [1; 2]));
]
Each line of code is a separate test case. A test case has a string giving it a
descriptive name, and a function to run as the test case. In between the name
and the function we write >::
, which is a custom operator defined by the OUnit
framework. Let’s look at the first function from above:
fun _ -> assert_equal 0 (sum [])
Every test case function receives as input a parameter that OUnit calls a test
context. Here (and in many of the test cases we write) we don’t actually need
to worry about the context, so we use the underscore to indicate that the
function ignores its input. The function then calls assert_equal
, which is a
function provided by OUnit that checks to see whether its two arguments are
equal. If so the test case succeeds. If not, the test case fails.
Then we created a test suite:
let tests = "test suite for sum" >::: [
"empty" >:: (fun _ -> assert_equal 0 (sum []));
"singleton" >:: (fun _ -> assert_equal 1 (sum [1]));
"two_elements" >:: (fun _ -> assert_equal 3 (sum [1; 2]));
]
The >:::
operator is another custom OUnit operator. It goes between the name
of the test suite and the list of test cases in that suite.
Then we ran the test suite:
let _ = run_test_tt_main tests
The function run_test_tt_main
is provided by OUnit. It runs a test suite and
prints the results of which test cases passed vs. which failed to standard
output. The use of let _ =
here indicates that we don’t care what value the
function returns; it just gets discarded.
3.3.3. Improving OUnit Output#
In our example with the buggy implementation of sum
, we got the following
output:
==============================================================================
Error: test suite for sum:2:two_elements.
...
not equal
------------------------------------------------------------------------------
The not equal
in the OUnit output means that assert_equal
discovered the two
values passed to it in that test case were not equal. That’s not so informative:
we’d like to know why they’re not equal. In particular, we’d like to know what
the actual output produced by sum
was for that test case. To find out, we need
to pass an additional argument to assert_equal
. That argument, whose label is
printer
, should be a function that can transform the outputs to strings. In
this case, the outputs are integers, so string_of_int
from the Stdlib
module
will suffice. We modify the test suite as follows:
let tests = "test suite for sum" >::: [
"empty" >:: (fun _ -> assert_equal 0 (sum []) ~printer:string_of_int);
"singleton" >:: (fun _ -> assert_equal 1 (sum [1]) ~printer:string_of_int);
"two_elements" >:: (fun _ -> assert_equal 3 (sum [1; 2]) ~printer:string_of_int);
]
And now we get more informative output:
==============================================================================
Error: test suite for sum:2:two_elements.
...
expected: 3 but got: 4
------------------------------------------------------------------------------
That output means that the test named two_elements
asserted the equality of
3
and 4
. The expected output was 3
because that was the first input to
assert_equal
, and that function’s specification says that in
assert_equal x y
, the output you (as the tester) are expecting to get should
be x
, and the output the function being tested actually produces should be
y
.
Notice how our test suite is accumulating a lot of redundant code. In
particular, we had to add the printer
argument to several lines. Let’s improve
that code by factoring out a function that constructs test cases:
let make_sum_test name expected_output input =
name >:: (fun _ -> assert_equal expected_output (sum input) ~printer:string_of_int)
let tests = "test suite for sum" >::: [
make_sum_test "empty" 0 [];
make_sum_test "singleton" 1 [1];
make_sum_test "two_elements" 3 [1; 2];
]
For output types that are more complicated than integers, you will end up
needing to write your own functions to pass to printer
. This is similar to
writing toString()
methods in Java: for complicated types you invent yourself,
the language doesn’t know how to render them as strings. You have to provide the
code that does it.
3.3.4. Testing for Exceptions#
We have a little more of OCaml to learn before we can see how to test for exceptions. You can peek ahead to the section on exceptions if you want to know now.
3.3.5. Test-Driven Development#
Testing doesn’t have to happen strictly after you write code. In test-driven development (TDD), testing comes first! It emphasizes incremental development of code: there is always something that can be tested. Testing is not something that happens after implementation; instead, continuous testing is used to catch errors early. Thus, it is important to develop unit tests immediately when the code is written. Automating test suites is crucial so that continuous testing requires essentially no effort.
Here’s an example of TDD. We deliberately choose an exceedingly simple function to implement, so that the process is clear. Suppose we are working with a data type for days:
type day = Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday
And we want to write a function next_weekday : day -> day
that returns
the next weekday after a given day. We start by writing the most basic,
broken version of that function we can:
let next_weekday d = failwith "Unimplemented"
Note
The built-in function failwith
raises an exception along with the error
message passed to the function.
Then we write the simplest unit test we can imagine. For example, we know that the next weekday after Monday is Tuesday. So we add a test:
let tests = "test suite for next_weekday" >::: [
"tue_after_mon" >:: (fun _ -> assert_equal Tuesday (next_weekday Monday));
]
Then we run the OUnit test suite. It fails, as expected. That’s good! Now we
have a concrete goal, to make that unit test pass. We revise next_weekday
to
make that happen:
let next_weekday d =
match d with
| Monday -> Tuesday
| _ -> failwith "Unimplemented"
We compile and run the test; it passes. Time to add some more tests. The simplest remaining possibilities are tests involving just weekdays, rather than weekends. So let’s add tests for weekdays.
let tests = "test suite for next_weekday" >::: [
"tue_after_mon" >:: (fun _ -> assert_equal Tuesday (next_weekday Monday));
"wed_after_tue" >:: (fun _ -> assert_equal Wednesday (next_weekday Tuesday));
"thu_after_wed" >:: (fun _ -> assert_equal Thursday(next_weekday Wednesday));
"fri_after_thu" >:: (fun _ -> assert_equal Friday (next_weekday Thursday));
]
We compile and run the tests; many fail. That’s good! We add new functionality:
let next_weekday d =
match d with
| Monday -> Tuesday
| Tuesday -> Wednesday
| Wednesday -> Thursday
| Thursday -> Friday
| _ -> failwith "Unimplemented"
We compile and run the tests; they pass. At this point we could move on to
handling weekends, but we should first notice something about the tests we’ve
written: they involve repeating a lot of code. In fact, we probably wrote them
by copying-and-pasting the first test, then modifying it for the next three.
That’s a sign that we should refactor the code. (As we did before with the
sum
function we were testing.)
Let’s abstract a function that creates test cases for next_weekday
:
let make_next_weekday_test name expected_output input =
name >:: (fun _ -> assert_equal expected_output (next_weekday input))
let tests = "test suite for next_weekday" >::: [
make_next_weekday_test "tue_after_mon" Tuesday Monday;
make_next_weekday_test "wed_after_tue" Wednesday Tuesday;
make_next_weekday_test "thu_after_wed" Thursday Wednesday;
make_next_weekday_test "fri_after_thu" Friday Thursday;
]
Now we finish the testing and implementation by handling weekends. First we add some test cases:
...
make_next_weekday_test "mon_after_fri" Monday Friday;
make_next_weekday_test "mon_after_sat" Monday Saturday;
make_next_weekday_test "mon_after_sun" Monday Sunday;
...
Then we finish the function:
let next_weekday d =
match d with
| Monday -> Tuesday
| Tuesday -> Wednesday
| Wednesday -> Thursday
| Thursday -> Friday
| Friday -> Monday
| Saturday -> Monday
| Sunday -> Monday
Of course, most people could write that function without errors even if they didn’t use TDD. But rarely do we implement functions that are so simple.
Process. Let’s review the process of TDD:
Write a failing unit test case. Run the test suite to prove that the test case fails.
Implement just enough functionality to make the test case pass. Run the test suite to prove that the test case passes.
Improve code as needed. In the example above we refactored the test suite, but often we’ll need to refactor the functionality being implemented.
Repeat until you are satisfied that the test suite provides evidence that your implementation is correct.