{
"cells": [
{
"cell_type": "markdown",
"id": "c87a0c5b",
"metadata": {},
"source": [
"# Function Documentation\n",
"\n",
"*This section continues the discussion of\n",
"[documentation](../basics/documentation), which we began in chapter 2.*\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"ggm5rjAyjhw\")}}\n",
"\n",
"A specification is written for humans to read, not machines. \"Specs\" can take\n",
"time to write well, and it is time well spent. The main goal is clarity. It is\n",
"also important to be concise, because client programmers will not always take\n",
"the effort to read a long spec. As with anything we write, we need to be aware\n",
"of our audience when writing specifications. Some readers may need a more\n",
"verbose specification than others.\n",
"\n",
"A well-written specification usually has several parts communicating different\n",
"kinds of information about the thing specified. If we know what the usual\n",
"ingredients of a specification are, we are less likely to forget to write down\n",
"something important. Let's now look at a recipe for writing specifications.\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"p5OTwjNTQIs\")}}\n",
"\n",
"## Returns Clause\n",
"\n",
"How might we specify `sqr`, a square-root function? First, we need to describe\n",
"its result. We will call this description the *returns clause* because it is a\n",
"part of the specification that describes the result of a function call. It is\n",
"also known as a *postcondition*: it describes a condition that holds after the\n",
"function is called. Here is an example of a returns clause:\n",
"\n",
"```ocaml\n",
"(** returns: [sqr x] is the square root of [x]. *)\n",
"```\n",
"\n",
"But we would typically leave out the `returns:`, and simply write the returns\n",
"clause as the first sentence of the comment:\n",
"\n",
"```ocaml\n",
"(** [sqr x] is the square root of [x]. *)\n",
"```\n",
"\n",
"For numerical programming, we should probably add some information about how\n",
"accurate it is.\n",
"\n",
"```ocaml\n",
"(** [sqr x] is the square root of [x]. Its relative accuracy is no worse than\n",
" [1.0e-6]. *)\n",
"```\n",
"\n",
"Similarly, here's how we might write a returns clause for a `find` function:\n",
"\n",
"```ocaml\n",
"(** [find lst x] is the index of [x] in [lst], starting from zero. *)\n",
"```\n",
"\n",
"A good specification is concise but clear—it should say enough that the\n",
"reader understands what the function does, but without extra verbiage to plow\n",
"through and possibly cause the reader to miss the point. Sometimes there is a\n",
"balance to be struck between brevity and clarity.\n",
"\n",
"These two specifications use a useful trick to make them more concise: they talk\n",
"about the result of applying the function being specified to some arbitrary\n",
"arguments. Implicitly we understand that the stated postcondition holds for all\n",
"possible values of any unbound variables (the argument variables).\n",
"\n",
"## Requires Clause\n",
"\n",
"The specification for `sqr` doesn't completely make sense because the square\n",
"root does not exist for some `x` of type `real`. The mathematical square root\n",
"function is a *partial* function that is defined over only part of its domain. A\n",
"good function specification is complete with respect to the possible inputs; it\n",
"provides the client with an understanding of what inputs are allowed and what\n",
"the results will be for allowed inputs.\n",
"\n",
"We have several ways to deal with partial functions. A straightforward approach\n",
"is to restrict the domain so that it is clear the function cannot be\n",
"legitimately used on some inputs. The specification rules out bad inputs with a\n",
"*requires clause* establishing when the function may be called. This clause is\n",
"also called a *precondition* because it describes a condition that must hold\n",
"before the function is called. Here is a requires clause for `sqr`:\n",
"\n",
"```ocaml\n",
"(** [sqr x] is the square root of [x]. Its relative accuracy is no worse\n",
" than [1.0e-6]. Requires: [x >= 0] *)\n",
"```\n",
"\n",
"This specification doesn't say what happens when `x < 0`, nor does it have to.\n",
"Remember that the specification is a contract. This contract happens to push the\n",
"burden of showing that the square root exists onto the client. If the requires\n",
"clause is not satisfied, the implementation is permitted to do anything it\n",
"likes: for example, go into an infinite loop or throw an exception. The\n",
"advantage of this approach is that the implementer is free to design an\n",
"algorithm without the constraint of having to check for invalid input\n",
"parameters, which can be tedious and slow down the program. The disadvantage is\n",
"that it may be difficult to debug if the function is called improperly, because\n",
"the function can misbehave and the client has no understanding of how it might\n",
"misbehave.\n",
"\n",
"## Raises Clause\n",
"\n",
"Another way to deal with partial functions is to convert them into total\n",
"functions (functions defined over their entire domain). This approach is\n",
"arguably easier for the client to deal with because the function's behavior is\n",
"always defined; it has no precondition. However, it pushes work onto the\n",
"implementer and may lead to a slower implementation.\n",
"\n",
"How can we convert `sqr` into a total function? One approach that is (too) often\n",
"followed is to define some value that is returned in the cases that the requires\n",
"clause would have ruled; for example:\n",
"\n",
"```ocaml\n",
"(** [sqr x] is the square root of [x] if [x >= 0],\n",
" with relative accuracy no worse than 1.0e-6.\n",
" Otherwise, a negative number is returned. *)\n",
"```\n",
"\n",
"This practice is not recommended because it tends to encourage broken,\n",
"hard-to-read client code. Almost any correct client of this abstraction will\n",
"write code like this if the precondition cannot be argued to hold:\n",
"\n",
"```ocaml\n",
"if sqr(a) < 0.0 then ... else ...\n",
"```\n",
"\n",
"The error must still be handled in the `if` expression, so the job of the client\n",
"of this abstraction isn't any easier than with a requires clause: the client\n",
"still needs to wrap an explicit test around the call in cases where it might\n",
"fail. If the test is omitted, the compiler won't complain, and the negative\n",
"number result will be silently treated as if it were a valid square root, likely\n",
"causing errors later during program execution. This coding style has been the\n",
"source of innumerable bugs and security problems in the Unix operating systems\n",
"and its descendents (e.g., Linux).\n",
"\n",
"A better way to make functions total is to have them raise an exception when the\n",
"expected input condition is not met. Exceptions avoid the necessity of\n",
"distracting error-handling logic in the client's code. If the function is to be\n",
"total, the specification must say what exception is raised and when. For\n",
"example, we might make our square root function total as follows:\n",
"\n",
"```ocaml\n",
"(** [sqr x] is the square root of [x], with relative accuracy no worse\n",
" than 1.0e-6. Raises: [Negative] if [x < 0]. *)\n",
"```\n",
"\n",
"Note that the implementation of this `sqr` function must check whether `x >= 0`,\n",
"even in the production version of the code, because some client may be relying\n",
"on the exception to be raised.\n",
"\n",
"## Examples Clause\n",
"\n",
"It can be useful to provide an illustrative example as part of a specification.\n",
"No matter how clear and well written the specification is, an example is often\n",
"useful to clients.\n",
"\n",
"```ocaml\n",
"(** [find lst x] is the index of [x] in [lst], starting from zero.\n",
" Example: [find [\"b\",\"a\",\"c\"] \"a\" = 1]. *)\n",
"```\n",
"\n",
"## The Specification Game\n",
"\n",
"When evaluating specifications, it can be useful to imagine that a game is being\n",
"played between two people: a *specifier* and a *devious programmer.*\n",
"\n",
"Suppose that the specifier writes the following specification:\n",
"\n",
"```ocaml\n",
"(** returns a list *)\n",
"val reverse : 'a list -> 'a list\n",
"```\n",
"\n",
"This spec is clearly incomplete. For example, a devious programmer could meet\n",
"the spec with an implementation that gives the following output:\n",
"\n",
"```ocaml\n",
"# reverse [1; 2; 3];;\n",
"- : int list = []\n",
"```\n",
"\n",
"The specifier, upon realizing this, refines the spec as follows:\n",
"\n",
"```ocaml\n",
"(** [reverse lst] returns a list that is the same length as [lst] *)\n",
"val reverse : 'a list -> 'a list\n",
"```\n",
"\n",
"But the devious specifier discovers that the spec still allows broken\n",
"implementations:\n",
"\n",
"```ocaml\n",
"# reverse [1; 2; 3];;\n",
"- : int list = [0; 0; 0]\n",
"```\n",
"\n",
"Finally, the specifier settles on a third spec:\n",
"\n",
"```ocaml\n",
"(** [reverse lst] returns a list [m] satisfying the following conditions:\n",
" - [length lst = length m]\n",
" - for all [i], [nth m i = nth lst (n - i - 1)],\n",
" where [n] is the length of [lst].\n",
" For example, [reverse [1; 2; 3]] is [3; 2; 1], and [reverse []] is []. *)\n",
"val reverse : 'a list -> 'a list\n",
"```\n",
"\n",
"With this spec, the devious programmer is forced to provide a working\n",
"implementation to meet the spec, so the specifier has successfully written her\n",
"spec.\n",
"\n",
"The point of playing this game is to improve your ability to write\n",
"specifications. Obviously we're not advocating that you deliberately try to\n",
"violate the intent of a specification and get away with it. When reading someone\n",
"else's specification, read as generously as possible. But be ruthless about\n",
"improving your own specifications.\n",
"\n",
"## Comments\n",
"\n",
"In addition to specifying functions, programmers need to provide comments in the\n",
"body of the functions. In fact, programmers usually do not write enough comments\n",
"in their code. (For a classic example, check out the\n",
"[actual comment on line 561][quake3-wtf] of the Quake 3 Arena game engine.)\n",
"\n",
"[quake3-wtf]: https://archive.softwareheritage.org/swh:1:cnt:bb0faf6919fc60636b2696f32ec9b3c2adb247fe;origin=https://github.com/id-Software/Quake-III-Arena;visit=swh:1:snp:4ab9bcef131aaf449a7c01370aff8c91dcecbf5f;anchor=swh:1:rev:dbe4ddb10315479fc00086f08e25d968b4b43c49;path=/code/game/q_math.c;lines=558-564\n",
"\n",
"But this doesn't mean that adding more comments is always better. The wrong\n",
"comments will simply obscure the code further. Shoveling as many comments into\n",
"code as possible usually makes the code worse! Both code and comments are\n",
"precise tools for communication (with the computer and with other programmers)\n",
"that should be wielded carefully.\n",
"\n",
"It is particularly annoying to read code that contains many interspersed\n",
"comments (typically of questionable value), e.g.:\n",
"\n",
"```ocaml\n",
"let y = x + 1 (* make y one greater than x *)\n",
"```\n",
"\n",
"For complex algorithms, some comments may be necessary to explain how the code\n",
"implementing the algorithm works. Programmers are often tempted to write\n",
"comments about the algorithm interspersed through the code. But someone reading\n",
"the code will often find these comments confusing because they don't have a\n",
"high-level picture of the algorithm. It is usually better to write a\n",
"paragraph-style comment at the beginning of the function explaining how its\n",
"implementation works. Explicit points in the code that need to be related to\n",
"that paragraph can then be marked with very brief comments, like `(* case 1 *)`.\n",
"\n",
"Another common but well-intentioned mistake is giving variables long,\n",
"descriptive names, as in the following verbose code:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8d878e34",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"val number_of_zeros : int list -> int = \n"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"let number_of_zeros the_list =\n",
" List.fold_left (fun (accumulator : int) (list_element : int) ->\n",
" accumulator + (if list_element = 0 then 1 else 0)) 0 the_list"
]
},
{
"cell_type": "markdown",
"id": "4f636842",
"metadata": {},
"source": [
"Code using such long names is verbose and hard to read. Instead of trying to\n",
"embed a complete description of a variable in its name, use a short and\n",
"suggestive name (e.g., `zeros`), and if necessary, add a comment at its\n",
"declaration explaining the purpose of the variable."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c107567b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"val zeros : int list -> int = \n"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"let zeros lst =\n",
" let is0 = function 0 -> 1 | _ -> 0 in\n",
" List.fold_left (fun zs x -> zs + is0 x) 0 lst"
]
},
{
"cell_type": "markdown",
"id": "ad9e3003",
"metadata": {},
"source": [
"A similarly bad practice is to encode the type of the variable in its name, e.g.\n",
"naming a variable `i_count` to show that it's an integer. The type system is\n",
"going to guarantee that for you, and your editor can provide a hover-over to\n",
"show the type. If you really want to emphasize the type in the code, add a type\n",
"annotation at the point where the variable comes into scope."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"formats": "md:myst",
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.13,
"jupytext_version": "1.10.3"
}
},
"kernelspec": {
"display_name": "OCaml",
"language": "OCaml",
"name": "ocaml-jupyter"
},
"language_info": {
"codemirror_mode": "text/x-ocaml",
"file_extension": ".ml",
"mimetype": "text/x-ocaml",
"name": "OCaml",
"nbconverter_exporter": null,
"pygments_lexer": "OCaml",
"version": "4.14.0"
},
"source_map": [
14,
264,
268,
275,
279
]
},
"nbformat": 4,
"nbformat_minor": 5
}