{
"cells": [
{
"cell_type": "markdown",
"id": "6b0cc542",
"metadata": {},
"source": [
"# Black-box and Glass-box Testing\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"W8FEhZIpYCg\")}}\n",
"\n",
"We would like to know that a program works on all possible inputs. The problem\n",
"with testing is that it is usually infeasible to try all the possible inputs.\n",
"For example, suppose that we are implementing a module that provides an abstract\n",
"data type for rational numbers. One of its operations might be an addition\n",
"function `plus`, e.g.:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "156dc1d5",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"module type RATIONAL =\n",
" sig type t val create : int -> int -> t val plus : t -> t -> t end\n"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"module Rational : RATIONAL\n"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module type RATIONAL = sig\n",
" (** A [t] is a rational. *)\n",
" type t\n",
"\n",
" (** [create p q] is the rational number [p/q].\n",
" Raises: [Invalid_argument \"0\"] if [q] is 0. *)\n",
" val create : int -> int -> t\n",
"\n",
" (** [plus r1 r2] is [r1 + r2] *)\n",
" val plus : t -> t -> t\n",
"end\n",
"\n",
"module Rational : RATIONAL = struct\n",
" (** AF: [(p, q)] represents the rational number [p/q]\n",
" RI: [q] is not 0. *)\n",
" type t = int * int\n",
"\n",
" let create p q =\n",
" if q = 0 then invalid_arg \"0\" else (p, q)\n",
"\n",
" let plus (p1, q1) (p2, q2) =\n",
" (p1 * q2 + p2 * q1, q1 * q2)\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "b2a121c3",
"metadata": {},
"source": [
"What would it take to exhaustively test just this one function? We'd want to try\n",
"all possible rationals as both the `r1` and `r2` arguments. A rational is formed\n",
"from two ints, and there are $2^{63}$ ints on a modern OCaml implementation.\n",
"Therefore there are approximately $(2^{63})^4 = 2^{252}$ possible inputs to the\n",
"`plus` function. Even if we test one addition every nanosecond, it will take\n",
"about $10^{59}$ years to finish testing this one function.\n",
"\n",
"Clearly we can't test software exhaustively. But that doesn't mean we should\n",
"give up on testing. It just means that we need to think carefully about what our\n",
"test cases should be so that they are as effective as possible at convincing us\n",
"that the code works.\n",
"\n",
"Consider our `create` function, above. It takes in two integers `p` and `q` as\n",
"arguments. How should we go about selecting a relatively small number of test\n",
"cases that will convince us that the function works correctly on all possible\n",
"inputs? We can visualize the space of all possible inputs as a large square:\n",
"\n",
"![](create_inputs.png)\n",
"\n",
"There are about $2^{126}$ points in this square, so we can't afford to test them\n",
"all. And testing them all is going to mostly be a waste of time—most of\n",
"the possible inputs provide nothing new. We need a way to find a set of points\n",
"in this space to test that are interesting and will give a good sense of the\n",
"behavior of the program across the whole space.\n",
"\n",
"Input spaces generally comprise a number of subsets in which the behavior of the\n",
"code is similar in some essential fashion across the entire subset. We don't get\n",
"any additional information by testing more than one input from each such subset.\n",
"\n",
"If we test all the interesting regions of the input space, we have achieved good\n",
"*coverage*. We want tests that in some useful sense cover the space of possible\n",
"program inputs.\n",
"\n",
"Two good ways of achieving coverage are *black-box testing* and *glass-box\n",
"testing*. We discuss those, next.\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"PmfQ9xS2YhQ\")}}\n",
"\n",
"## Black-box Testing\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"2GjVqe9IKEk\")}}\n",
"\n",
"In selecting our test cases for good coverage, we might want to consider both\n",
"the specification and the implementation of the program or module being tested.\n",
"It turns out that we can often do a pretty good job of picking test cases by\n",
"just looking at the specification and ignoring the implementation. This is known\n",
"as **black-box testing**. The idea is that we think of the code as a black box\n",
"about which all we can see is its surface: its specification. We pick test cases\n",
"by looking at how the specification implicitly introduces boundaries that divide\n",
"the space of possible inputs into different regions.\n",
"\n",
"When writing black-box test cases, we ask ourselves what set of test cases that\n",
"will produce distinctive behavior as predicted by the specification. It is\n",
"important to try out both *typical inputs* and inputs that are *boundary cases*\n",
"aka *corner cases* or *edge cases*. A common error is to only test typical\n",
"inputs, with the result that the program usually works but fails in less\n",
"frequent situations. It's also important to identify ways in which the\n",
"specification creates classes of inputs that should elicit similar behavior from\n",
"the function, and to test on those *paths through the specification*. Here are\n",
"some examples.\n",
"\n",
"**Example 1.**\n",
"\n",
"Here are some ideas for how to test the `create` function:\n",
"\n",
"- Looking at the square above, we see that it has boundaries at `min_int` and\n",
" `max_int`. We want to try to construct rationals at the corners and along the\n",
" sides of the square, e.g. `create min_int min_int`, `create max_int 2`, etc.\n",
"\n",
"- The line `p=0` is important because `p/q` is zero all along it. We should try\n",
" `(0, q)` for various values of `q`.\n",
"\n",
"- We should try some typical `(p, q)` pairs in all four quadrants of the space.\n",
"\n",
"- We should try both `(p, q)` pairs in which `q` divides evenly into `p`, and\n",
" pairs in which `q` does not divide into `p`.\n",
"\n",
"- Pairs of the form `(1, q)`, `(-1, q)`, `(p, 1)`, `(p, -1)` for various `p` and\n",
" `q` also may be interesting given the properties of rational numbers.\n",
"\n",
"The specification also says that the code will check that `q` is not zero. We\n",
"should construct some test cases to ensure this checking is done as advertised.\n",
"Trying `(1, 0)`, `(max_int, 0)`, `(min_int, 0)`, `(-1, 0)`, `(0, 0)` to see that\n",
"they all raise the specified exception would probably be an adequate set of\n",
"black-box tests.\n",
"\n",
"**Example 2.**\n",
"\n",
"Consider a function `list_max`:\n",
"\n",
"```ocaml\n",
"(** Return the maximum element in the list. *)\n",
"val list_max: int list -> int\n",
"```\n",
"\n",
"What is a good set of black-box test cases? Here the input space is the set of\n",
"all possible lists of ints. We need to try some typical inputs and also consider\n",
"boundary cases. Based on this spec, boundary cases include the following:\n",
"\n",
"- A list containing one element. In fact, an empty list is probably the first\n",
" boundary case we think of. Looking at the spec above, we realize that it\n",
" doesn't specify what happens in the case of an empty list. Thus, thinking\n",
" about boundary cases is also useful in identifying errors in the\n",
" specification.\n",
"\n",
"- A list containing two elements.\n",
"\n",
"- A list in which the maximum is the first element. Or the last element. Or\n",
" somewhere in the middle of the list.\n",
"\n",
"- A list in which every element is equal.\n",
"\n",
"- A list in which the elements are arranged in ascending sorted order, and one\n",
" in which they are arranged in descending sorted order.\n",
"\n",
"- A list in which the maximum element is `max_int`, and a list in which the\n",
" maximum element is `min_int`.\n",
"\n",
"**Example 3.**\n",
"\n",
"Consider the function `sqrt`:\n",
"\n",
"```ocaml\n",
"(** [sqrt x n] is the square root of [x] computed to an accuracy of [n]\n",
" significant digits.\n",
" Requires: [x >= 0] and [n >= 1]. *)\n",
"val sqrt : float -> int -> float\n",
"```\n",
"\n",
"The precondition identifies two possibilities for `x` (either it is 0 or\n",
"greater) and two possibilities for `n` (either it is 1 or greater). That leads\n",
"to four \"paths through the specification\", i.e., representative and boundary\n",
"cases for satisfying the precondition, which we should test:\n",
"\n",
"- `x` is 0 and `n` is 1\n",
"\n",
"- `x` is greater than 0 and `n` is 1\n",
"\n",
"- `x` is 0 and `n` is greater than 1\n",
"\n",
"- `x` is greater than 0 and `n` is greater than 1.\n",
"\n",
"## Black-box Testing of Data Abstractions\n",
"\n",
"So far we've been thinking about testing just one function at a time. But data\n",
"abstractions usually have many operations, and we need to test how those\n",
"operations interact with one another. It's useful to distinguish *consumer* and\n",
"*producers* of the data abstraction:\n",
"\n",
"- A consumer is an operation that takes a value of the data abstraction as\n",
" input.\n",
"\n",
"- A producer is an operation that returns a value of the data abstraction\n",
" as output.\n",
"\n",
"For example, consider this set abstraction:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "92fd4ea9",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"module type Set =\n",
" sig\n",
" type 'a t\n",
" val empty : 'a t\n",
" val size : 'a t -> int\n",
" val add : 'a -> 'a t -> 'a t\n",
" val mem : 'a -> 'a t -> bool\n",
" end\n"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module type Set = sig\n",
"\n",
" (** ['a t] is the type of a set whose elements have type ['a]. *)\n",
" type 'a t\n",
"\n",
" (** [empty] is the empty set. *)\n",
" val empty : 'a t\n",
"\n",
" (** [size s] is the number of elements in [s]. *\n",
" [size empty] is [0]. *)\n",
" val size : 'a t -> int\n",
"\n",
" (** [add x s] is a set containing all the elements of\n",
" [s] as well as element [x]. *)\n",
" val add : 'a -> 'a t -> 'a t\n",
"\n",
" (** [mem x s] is [true] iff [x] is an element of [s]. *)\n",
" val mem : 'a -> 'a t -> bool\n",
"\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "4faa636d",
"metadata": {},
"source": [
"The `empty` and `add` functions are producers; and the `size`, `add` and `mem`\n",
"functions are consumers.\n",
"\n",
"When black-box testing a data abstraction, we should test how each consumer of\n",
"the data abstraction handles every path through each producer of it. In the\n",
"`Set` example, that means testing the following:\n",
"\n",
"* how `size` handles the `empty` set;\n",
"\n",
"* how `size` handles a set produced by `add`, both when `add` leaves the set\n",
" unchanged as well as when it increases the set;\n",
"\n",
"* how `add` handles sets produced by `empty` as well as `add` itself;\n",
"\n",
"* how `mem` handles sets produced by `empty` as well as `add`, including paths\n",
" where `mem` is invoked on elements that have been added as well as elements\n",
" that have not.\n",
"\n",
"## Glass-box Testing\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"128hYUobA_g\")}}\n",
"\n",
"Black-box testing is a good place to start when writing test cases, but\n",
"ultimately it is not enough. In particular, it's not possible to determine how\n",
"much coverage of the implementation a black-box test suite actually\n",
"achieves—we actually need to know the implementation source code. Testing\n",
"based on that code is known as *glass box* or *white box* testing. Glass-box\n",
"testing can improve on black-box by testing *execution paths* through the\n",
"implementation code: the series of expressions that is conditionally evaluated\n",
"based on if-expressions, match-expressions, and function applications. Test\n",
"cases that collectively exercise all paths are said to be *path-complete*. At a\n",
"minimum, path-completeness requires that for every line of code, and even for\n",
"every expression in the program, there should be a test case that causes it to\n",
"be executed. Any unexecuted code could contain a bug if has never been tested.\n",
"\n",
"For true path completeness we must consider all possible execution paths from\n",
"start to finish of each function, and try to exercise every distinct path. In\n",
"general this is infeasible, because there are too many paths. A good approach is\n",
"to think of the set of paths as the space that we are trying to explore, and to\n",
"identify boundary cases within this space that are worth testing.\n",
"\n",
"For example, consider the following implementation of a function that finds the\n",
"maximum of its three arguments:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a11ffec7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"val max3 : 'a -> 'a -> 'a -> 'a = \n"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"let max3 x y z =\n",
" if x > y then\n",
" if x > z then x else z\n",
" else\n",
" if y > z then y else z"
]
},
{
"cell_type": "markdown",
"id": "2e6d9f03",
"metadata": {},
"source": [
"Black-box testing might lead us to invent many tests, but looking at the\n",
"implementation reveals that there are only four paths through the code—the\n",
"paths that return `x`, `z`, `y`, or `z` (again). We could test each of those\n",
"paths with representative inputs such as: `max3 3 2 1`, `max3 3 2 4`,\n",
"`max3 1 2 1`, `max3 1 2 3`.\n",
"\n",
"When doing glass box testing, we should include test cases for each branch of\n",
"each (nested) if expression, and each branch of each (nested) pattern match. If\n",
"there are recursive functions, we should include test cases for the base cases\n",
"as well as each recursive call. Also, we should include test cases to trigger\n",
"each place where an exception might be raised.\n",
"\n",
"Of course, path complete testing does not guarantee an absence of errors. We\n",
"still need to test against the specification, i.e., do black-box testing. For\n",
"example, here is a broken implementation of `max3`:\n",
"\n",
"```ocaml\n",
"let max3 x y z = x\n",
"```\n",
"\n",
"The test `max3 2 1 1` is path complete, but doesn't reveal the error.\n",
"\n",
"## Glass-box Testing of Data Abstractions\n",
"\n",
"Look at the abstraction function and representation invariant for hints about\n",
"what boundaries may exist in the space of values manipulated by a data\n",
"abstraction. The rep invariant is a particularly effective tool for constructing\n",
"useful test cases. Looking at the rep invariant of the `Rational` data\n",
"abstraction above, we see that it requires that `q` is non-zero. Therefore we\n",
"should construct test cases to see whether it's possible to cause that invariant\n",
"to be violated.\n",
"\n",
"## Black-box vs. Glass-box\n",
"\n",
"Black-box testing has some important advantages:\n",
"\n",
"- It doesn't require that we see the code we are testing. Sometimes\n",
" code will not be available in source code form, yet we can still\n",
" construct useful test cases without it. The person writing the test\n",
" cases does not need to understand the implementation.\n",
"\n",
"- The test cases do not depend on the implementation. They can be\n",
" written in parallel with or before the implementation. Further, good\n",
" black-box test cases do not need to be changed, even if the\n",
" implementation is completely rewritten.\n",
"\n",
"- Constructing black-box test cases causes the programmer to think\n",
" carefully about the specification and its implications. Many\n",
" specification errors are caught this way.\n",
"\n",
"The disadvantage of black box testing is that its coverage may not be as\n",
"high as we'd like, because it has to work without the implementation.\n",
"\n",
"## Bisect\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"efAvB6KuHLY\")}}\n",
"\n",
"Glass-box testing can be aided by *code-coverage tools* that assess how much of\n",
"the code has been exercised by a test suite. The [bisect_ppx][] tool for OCaml\n",
"can tell you which expressions in your program have been tested, and which have\n",
"not. Here's how it works:\n",
"\n",
"[bisect_ppx]: https://github.com/aantron/bisect_ppx\n",
"\n",
"- You compile your code using Bisect\\_ppx (henceforth, just Bisect for short) as\n",
" part of the compilation process. It *instruments* your code, mainly by\n",
" inserting additional expressions to be evaluated.\n",
"\n",
"- You run your code. The instrumentation that Bisect inserted causes your\n",
" program to do something in addition to whatever functionality you programmed\n",
" yourself: the program will now record which expressions from the source code\n",
" actually get executed at run time, and which do not. Also, the program will\n",
" now produce an output file that contains that information.\n",
"\n",
"- You run a tool called `bisect-ppx-report` on that output file. It produces\n",
" HTML showing you which parts of your code got executed, and which did not.\n",
"\n",
"How does that help with computing coverage of a test suite? If you run your\n",
"OUnit test suite, the test cases in it will cause the code in whatever functions\n",
"they test to be executed. If you don't have enough test cases, some code in your\n",
"functions will never be executed. The report produced by Bisect will show you\n",
"exactly what code that is. You can then design new glass-box test cases to cause\n",
"that code to execute, add them to your OUnit suite, and create a new Bisect\n",
"report to confirm that the code really did get executed.\n",
"\n",
"**Bisect Tutorial.**\n",
"\n",
"1. Download the file {{ code_link | replace(\"%%NAME%%\", \"sorts.ml\") }}. You will\n",
" find an implementation of insertion sort and merge sort.\n",
"\n",
"2. Download the file {{ code_link | replace(\"%%NAME%%\", \"test_sorts.ml\") }}. It\n",
" has the skeleton for an OUnit test suite.\n",
"\n",
"3. Create a `dune` file to execute `test_sorts`:\n",
" ```text\n",
" (executable\n",
" (name test_sorts)\n",
" (libraries ounit2)\n",
" (instrumentation\n",
" (backend bisect_ppx)))\n",
" ```\n",
"\n",
"4. Run:\n",
" ```console\n",
" $ dune exec --instrument-with bisect_ppx ./test_sorts.exe\n",
" ```\n",
" That will execute the test suite with Bisect coverage enabled, causing some\n",
" files named `bisectNNNN.coverage` to be produced.\n",
"\n",
"5. Run:\n",
" ```console\n",
" $ bisect-ppx-report html\n",
" ```\n",
" to generate the Bisect report from your test suite execution. The report\n",
" is in a newly-created directory named `_coverage`.\n",
"\n",
"6. Open the file `_coverage/index.html` in a web browser. Look at the per-file\n",
" coverage; you'll see we've managed to test a few percent of `sorts.ml` with\n",
" our test suite so far. Click on the link in that report for `sorts.ml`.\n",
" You'll see that we've managed to cover only one line of the source code.\n",
"\n",
"7. There are some additional tests in the test file. Try un-commenting those, as\n",
" documented in the test file, and increasing your code coverage. Between each\n",
" run, you will need to delete the `bisectNNNN.coverage` files, otherwise\n",
" the report will contain information from those previous runs:\n",
" ```console\n",
" $ rm bisect*.coverage\n",
" ```\n",
" By the time you're done un-commenting the provided tests, you should be at 25%\n",
" coverage, including all of the insertion sort implementation. For fun, try\n",
" adding more tests to get 100% coverage of merge sort.\n",
"\n",
"**Parallelism.** OUnit will by default attempt to run some of the tests in\n",
"parallel, which reduces the time it takes to run a large test suite, at the\n",
"tradeoff of making it nondeterministic in what order the tests run. It's\n",
"possible for that to affect coverage if you are testing imperative code. To make\n",
"the tests run one at a time, in order, you can pass the flag\n",
"`-runner sequential` to the executable. OUnit will see that flag and cease\n",
"parallelization:\n",
"\n",
"```console\n",
"$ dune exec --instrument-with bisect_ppx ./test_sorts.exe -- -runner sequential\n",
"```"
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"formats": "md:myst",
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.13,
"jupytext_version": "1.10.3"
}
},
"kernelspec": {
"display_name": "OCaml",
"language": "OCaml",
"name": "ocaml-jupyter"
},
"language_info": {
"codemirror_mode": "text/x-ocaml",
"file_extension": ".ml",
"mimetype": "text/x-ocaml",
"name": "OCaml",
"nbconverter_exporter": null,
"pygments_lexer": "OCaml",
"version": "4.14.0"
},
"source_map": [
14,
26,
51,
210,
232,
278,
284
]
},
"nbformat": 4,
"nbformat_minor": 5
}