{
"cells": [
{
"cell_type": "markdown",
"id": "ad872c84",
"metadata": {},
"source": [
"# Module Documentation\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"4OTspWNefn4\")}}\n",
"\n",
"The specification of functions provided by a module can be found in its\n",
"interface, which is what clients will consult. But what about *internal*\n",
"documentation, which is relevant to those who implement and maintain a module?\n",
"The purpose of such implementation comments is to explain to the reader how the\n",
"implementation correctly implements its interface.\n",
"\n",
"```{admonition} Reminder\n",
"It is inappropriate to copy the specifications of functions found in the module\n",
"interface into the module implementation. Copying runs the risk of introducing\n",
"inconsistency as the program evolves, because programmers don't keep the copies\n",
"in sync. Copying code and specifications is a major source (if not *the* major\n",
"source) of program bugs. In any case, implementers can always look at the\n",
"interface for the specification.\n",
"```\n",
"\n",
"Implementation comments fall into two categories. The first category arises\n",
"because a module implementation may define new types and functions that are\n",
"purely internal to the module. If their significance is not obvious, these types\n",
"and functions should be documented in much the same style that we have suggested\n",
"for documenting interfaces. Often, as the code is written, it becomes apparent\n",
"that the new types and functions defined in the module form an internal data\n",
"abstraction or at least a collection of functionality that makes sense as a\n",
"module in its own right. This is a signal that the internal data abstraction\n",
"might be moved to a separate module and manipulated only through its operations.\n",
"\n",
"The second category of implementation comments is associated with the use of\n",
"*data abstraction*. Suppose we are implementing an abstraction for a set of\n",
"items of type `'a`. The interface might look something like this:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "88fffc65",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"module type Set =\n",
" sig\n",
" type 'a t\n",
" val empty : 'a t\n",
" val mem : 'a -> 'a t -> bool\n",
" val add : 'a -> 'a t -> 'a t\n",
" val rem : 'a -> 'a t -> 'a t\n",
" val size : 'a t -> int\n",
" val union : 'a t -> 'a t -> 'a t\n",
" val inter : 'a t -> 'a t -> 'a t\n",
" end\n"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(** A set is an unordered collection in which multiplicity is ignored. *)\n",
"module type Set = sig\n",
"\n",
" (** ['a t] represents a set whose elements are of type ['a] *)\n",
" type 'a t\n",
"\n",
" (** [empty] is the set containing no elements *)\n",
" val empty : 'a t\n",
"\n",
" (** [mem x s] is whether [x] is a member of set [s] *)\n",
" val mem : 'a -> 'a t -> bool\n",
"\n",
" (** [add x s] is the set containing all the elements of [s]\n",
" as well as [x]. *)\n",
" val add : 'a -> 'a t -> 'a t\n",
"\n",
" (** [rem x s] is the set containing all the elements of [s],\n",
" minus [x]. *)\n",
" val rem : 'a -> 'a t -> 'a t\n",
"\n",
" (** [size s] is the cardinality of [s] *)\n",
" val size: 'a t -> int\n",
"\n",
" (** [union s1 s2] is the set containing all the elements that\n",
" are in either [s1] or [s2]. *)\n",
" val union: 'a t -> 'a t -> 'a t\n",
"\n",
" (** [inter s1 s2] is the set containing all the elements that\n",
" are in both [s1] and [s2]. *)\n",
" val inter: 'a t -> 'a t -> 'a t\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "7320b81e",
"metadata": {},
"source": [
"In a real signature for sets, we'd want operations such as `map` and `fold` as\n",
"well, but let's omit these for now for simplicity. There are many ways to\n",
"implement this abstraction.\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"abtrrowewaw\")}}\n",
"\n",
"As we've seen before, one easy way is as a list:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "e7b36f24",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"module ListSet : Set\n"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(** Implementation of sets as lists with duplicates *)\n",
"module ListSet : Set = struct\n",
" type 'a t = 'a list\n",
" let empty = []\n",
" let mem = List.mem\n",
" let add = List.cons\n",
" let rem x = List.filter (( <> ) x)\n",
" let size lst = List.(lst |> sort_uniq Stdlib.compare |> length)\n",
" let union lst1 lst2 = lst1 @ lst2\n",
" let inter lst1 lst2 = List.filter (fun h -> mem h lst2) lst1\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "9cd791cc",
"metadata": {},
"source": [
"This implementation has the advantage of simplicity. For small sets that tend\n",
"not to have duplicate elements, it will be a fine choice. Its performance will\n",
"be poor for large sets or applications with many duplicates but for some\n",
"applications that's not an issue.\n",
"\n",
"Notice that the types of the functions do not need to be written down in the\n",
"implementation. They aren't needed because they're already present in the\n",
"signature, just like the specifications that are also in the signature don't\n",
"need to be replicated in the structure.\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"n8irzSGGVao\")}}\n",
"\n",
"Here is another implementation of `Set` that also uses `'a list` but requires\n",
"the lists to contain no duplicates. This implementation is also correct (and\n",
"also slow for large sets). Notice that we are using the same representation\n",
"type, yet some important aspects of the implementation (`add`, `size`,\n",
"`union`) are quite different."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f526fe3b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"module UniqListSet : Set\n"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(** Implementation of sets as lists without duplicates *)\n",
"module UniqListSet : Set = struct\n",
" type 'a t = 'a list\n",
" let empty = []\n",
" let mem = List.mem\n",
" let add x lst = if mem x lst then lst else x :: lst\n",
" let rem x = List.filter (( <> ) x)\n",
" let size = List.length\n",
" let union lst1 lst2 = lst1 @ lst2 |> List.sort_uniq Stdlib.compare\n",
" let inter lst1 lst2 = List.filter (fun h -> mem h lst2) lst1\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "36eac820",
"metadata": {},
"source": [
"An important reason why we introduced the writing of function specifications was\n",
"to enable *local reasoning*: once a function has a spec, we can judge whether\n",
"the function does what it is supposed to without looking at the rest of the\n",
"program. We can also judge whether the rest of the program works without looking\n",
"at the code of the function. However, we cannot reason locally about the\n",
"individual functions in the three module implementations just given. The problem\n",
"is that we don't have enough information about the relationship between the\n",
"concrete type (`int list`) and the corresponding abstract type (`set`). This\n",
"lack of information can be addressed by adding two new kinds of comments to the\n",
"implementation: the *abstraction function* and the *representation invariant*\n",
"for the abstract data type. We turn to discussion of those, next.\n",
"\n",
"## Abstraction Functions\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"ZYYa66804Q4\")}}\n",
"\n",
"The client of any `Set` implementation should not be able to distinguish it from\n",
"any other implementation based on its functional behavior. As far as the client\n",
"can tell, the operations act like operations on the mathematical ideal of a set.\n",
"In the first implementation, the lists `[3; 1]`, `[1; 3]`, and `[1; 1; 3]` are\n",
"distinguishable to the implementer, but not to the client. To the client, they\n",
"all represent the abstract set {1, 3} and cannot be distinguished by any of the\n",
"operations of the `Set` signature. From the point of view of the client, the\n",
"abstract data type describes a set of abstract values and associated operations.\n",
"The implementer knows that these abstract values are represented by concrete\n",
"values that may contain additional information invisible from the client's view.\n",
"This loss of information is described by the *abstraction function*, which is a\n",
"mapping from the space of concrete values to the abstract space. The abstraction\n",
"function for the implementation `ListSet` looks like this:\n",
"\n",
"![af-listset](abst-fcn2.png)\n",
"\n",
"Notice that several concrete values may map to a single abstract value;\n",
"that is, the abstraction function may be *many-to-one*. It is also\n",
"possible that some concrete values do not map to any abstract value; the\n",
"abstraction function may be *partial*. That is not the case with `ListSet`,\n",
"but it might be with other implementations.\n",
"\n",
"The abstraction function is important for deciding whether an\n",
"implementation is correct, therefore it belongs as a comment in the\n",
"implementation of any abstract data type. For example, in the `ListSet`\n",
"module, we could document the abstraction function as follows:\n",
"\n",
"```ocaml\n",
"module ListSet : Set = struct\n",
" (** Abstraction function: The list [[a1; ...; an]] represents the\n",
" set [{b1, ..., bm}], where [[b1; ...; bm]] is the same list as\n",
" [[a1; ...; an]] but with any duplicates removed. The empty list\n",
" [[]] represents the empty set [{}]. *)\n",
" type 'a t = 'a list\n",
" ...\n",
"end\n",
"```\n",
"\n",
"This comment explicitly points out that the list may contain duplicates, which\n",
"is helpful as a reinforcement of the first sentence. Similarly, the case of an\n",
"empty list is mentioned explicitly for clarity, although some might consider it\n",
"to be redundant.\n",
"\n",
"The abstraction function for the second implementation, which does not allow\n",
"duplicates, hints at an important difference. We can write the abstraction\n",
"function for this second representation a bit more simply because we know that\n",
"the elements are distinct.\n",
"\n",
"```ocaml\n",
"module UniqListSet : Set = struct\n",
" (** Abstraction function: The list [[a1; ...; an]] represents the set\n",
" [{a1, ..., an}]. The empty list [[]] represents the empty set [{}]. *)\n",
" type 'a t = 'a list\n",
" ...\n",
"end\n",
"```\n",
"\n",
"## Implementing the Abstraction Function\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"ebcnwXuWPI0\")}}\n",
"\n",
"What would it mean to implement the abstraction function for `ListSet`? We'd\n",
"want a function that took an input of type `'a ListSet.t`. But what should its\n",
"output type be? The abstract values are mathematical sets, not OCaml types. If\n",
"we did hypothetically have a type `'a set` that our abstraction function could\n",
"return, there would have been little point in developing `ListSet`; we could\n",
"have just used that `'a set` type without doing any work of our own.\n",
"\n",
"On the other hand, we might implement something close to the abstraction\n",
"function by converting an input of type `'a ListSet.t` to a built-in OCaml type\n",
"or standard library type:\n",
"\n",
"- We could convert to a `string`. That would have the advantage of being easily\n",
" readable by humans in the toplevel or in debug output. Java programmers use\n",
" `toString()` for similar purposes.\n",
"\n",
"- We could convert to `'a list`. (Actually there's little conversion to be\n",
" done). For data collections this is a convenient choice, since lists can at\n",
" least approximately represent many data structures: stacks, queues,\n",
" dictionaries, sets, heaps, etc.\n",
"\n",
"The following functions implement those ideas. Note that `to_string` has\n",
"to take an additional argument `string_of_val` from the client to convert\n",
"`'a` to `string`.\n",
"\n",
"```ocaml\n",
"module ListSet : Set = struct\n",
" ...\n",
"\n",
" let uniq lst = List.sort_uniq Stdlib.compare lst\n",
"\n",
" let to_string string_of_val lst =\n",
" let interior =\n",
" lst |> uniq |> List.map string_of_val |> String.concat \", \"\n",
" in\n",
" \"{\" ^ interior ^ \"}\"\n",
"\n",
" let to_list = uniq\n",
"end\n",
"```\n",
"\n",
"Installing a custom formatter, as discussed in the\n",
"[section on encapsulation][encap], could also be understood as implementing the\n",
"abstraction function. But in that case it's usable only by humans at the\n",
"toplevel rather than other code, programmatically.\n",
"\n",
"[encap]: ../modules/encapsulation.md\n",
"\n",
"## Commutative Diagrams\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"PKAgq97NBdg\")}}\n",
"\n",
"Using the abstraction function, we can now talk about what it means for an\n",
"implementation of an abstraction to be *correct*. It is correct exactly when\n",
"every operation that takes place in the concrete space makes sense when mapped\n",
"by the abstraction function into the abstract space. This can be visualized as a\n",
"*commutative diagram*:\n",
"\n",
"![commutative-diagram](commutation.png)\n",
"\n",
"A commutative diagram means that if we take the two paths around the diagram, we\n",
"have to get to the same place. Suppose that we start from a concrete value and\n",
"apply the actual implementation of some operation to it to obtain a new concrete\n",
"value or values. When viewed abstractly, a concrete result should be an abstract\n",
"value that is a possible result of applying the function *as described in its\n",
"specification* to the abstract view of the actual inputs. For example, consider\n",
"the union function from the implementation of sets as lists with repeated\n",
"elements covered last time. When this function is applied to the concrete pair\n",
"[1; 3], [2; 2], it corresponds to the lower-left corner of the diagram. The\n",
"result of this operation is the list [2; 2; 1; 3], whose corresponding abstract\n",
"value is the list {1, 2, 3}. Note that if we apply the abstraction function AF\n",
"to the input lists [1; 3] and [2; 2], we have the sets {1, 3} and {2}. The\n",
"commutative diagram requires that in this instance the union of {1, 3} and {2}\n",
"is {1, 2, 3}, which is of course true.\n",
"\n",
"## Representation Invariants\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"mEtR4YGMRPY\")}}\n",
"\n",
"The abstraction function explains how information within the module is viewed\n",
"abstractly by module clients. But that is not all we need to know to ensure\n",
"correctness of the implementation. Consider the `size` function in each of the\n",
"two implementations. For `ListSet`, which allows duplicates, we need to be sure\n",
"not to double-count duplicate elements:\n",
"\n",
"```ocaml\n",
"let size lst = List.(lst |> sort_uniq Stdlib.compare |> length)\n",
"```\n",
"\n",
"But for `UniqListSet`, in which the lists have no duplicates, the size is just\n",
"the length of the list:\n",
"\n",
"```ocaml\n",
"let size = List.length\n",
"```\n",
"\n",
"How do wo know that latter implementation is correct? That is, how do we know\n",
"that \"lists have no duplicates\"? It's hinted at by the name of the module, and\n",
"it can be deduced from the implementation of `add`, but we've never carefully\n",
"documented it. Right now, the code does not explicitly say that there are no\n",
"duplicates.\n",
"\n",
"In the `UniqListSet` representation, not all concrete data items represent\n",
"abstract data items. That is, the *domain* of the abstraction function does not\n",
"include all possible lists. There are some lists, such as `[1; 1; 2]`, that\n",
"contain duplicates and must never occur in the representation of a set in the\n",
"`UniqListSet` implementation; the abstraction function is undefined on such\n",
"lists. We need to include a second piece of information, the *representation\n",
"invariant* (or *rep invariant*, or *RI*), to determine which concrete data items\n",
"are valid representations of abstract data items. For sets represented as lists\n",
"without duplicates, we write this as part of the comment together with the\n",
"abstraction function:\n",
"\n",
"```ocaml\n",
"module UniqListSet : Set = struct\n",
" (** Abstraction function: the list [[a1; ...; an]] represents the set\n",
" [{a1, ..., an}]. The empty list [[]] represents the empty set [{}].\n",
" Representation invariant: the list contains no duplicates. *)\n",
" type 'a t = 'a list\n",
" ...\n",
"end\n",
"```\n",
"\n",
"If we think about this issue in terms of the commutative diagram, we see that\n",
"there is a crucial property that is necessary to ensure correctness: namely,\n",
"that all concrete operations preserve the representation invariant. If this\n",
"constraint is broken, functions such as `size` will not return the correct\n",
"answer. The relationship between the representation invariant and the\n",
"abstraction function is depicted in this figure:\n",
"\n",
"![af-and-ri](ri-af.png)\n",
"\n",
"We can use the rep invariant and abstraction function to judge whether the\n",
"implementation of a single operation is correct *in isolation from the rest of\n",
"the functions in the module*. A function is correct if these conditions:\n",
"\n",
"1. The function's preconditions hold of the argument values.\n",
"2. The concrete representations of the arguments satisfy the rep invariant.\n",
"\n",
"imply these conditions:\n",
"\n",
"1. All new representation values created satisfy the rep invariant.\n",
"2. The commutative diagram holds.\n",
"\n",
"The rep invariant makes it easier to write code that is provably correct,\n",
"because it means that we don't have to write code that works for all possible\n",
"incoming concrete representations—only those that satisfy the rep\n",
"invariant. For example, in the implementation `UniqListSet`, we do not care what\n",
"the code does on lists that contain duplicate elements. However, we do need to\n",
"be concerned that on return, we only produce values that satisfy the rep\n",
"invariant. As suggested in the figure above, if the rep invariant holds for the\n",
"input values, then it should hold for the output values, which is why we call it\n",
"an *invariant*.\n",
"\n",
"## Implementing the Representation Invariant\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"4bfh_aONcRc\")}}\n",
"\n",
"When implementing a complex abstract data type, it is often helpful to write an\n",
"internal function that can be used to check that the rep invariant holds of a\n",
"given data item. By convention we will call this function `rep_ok`. If the\n",
"module accepts values of the abstract type that are created outside the module,\n",
"say by exposing the implementation of the type in the signature, then `rep_ok`\n",
"should be applied to these to ensure the representation invariant is satisfied.\n",
"In addition, if the implementation creates any new values of the abstract type,\n",
"`rep_ok` can be applied to them as a sanity check. With this approach, bugs are\n",
"caught early, and a bug in one function is less likely to create the appearance\n",
"of a bug in another.\n",
"\n",
"A convenient way to write `rep_ok` is to make it an identity function that just\n",
"returns the input value if the rep invariant holds and raises an exception if it\n",
"fails.\n",
"\n",
"```ocaml\n",
"(* Checks whether x satisfies the representation invariant. *)\n",
"let rep_ok x =\n",
" if (* check the RI holds of x *) then x else failwith \"RI violated\"\n",
"```\n",
"\n",
"Here is an implementation of `Set` that uses the same data representation as\n",
"`UniqListSet`, but includes copious `rep_ok` checks. Note that `rep_ok` is\n",
"applied to all input sets and to any set that is ever created. This ensures that\n",
"if a bad set representation is created, it will be detected immediately. In case\n",
"we somehow missed a check on creation, we also apply `rep_ok` to incoming set\n",
"arguments. If there is a bug, these checks will help us quickly figure out where\n",
"the rep invariant is being broken."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "2516d9a4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"module UniqListSet : Set\n"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(** Implementation of sets as lists without duplicates. *)\n",
"module UniqListSet : Set = struct\n",
"\n",
" (** Abstraction function: The list [[a1; ...; an]] represents the\n",
" set {a1, ..., an}. The empty list [[]] represents the empty set [{}].\n",
" Representation invariant: the list contains no duplicates. *)\n",
" type 'a t = 'a list\n",
"\n",
" let rep_ok lst =\n",
" let u = List.sort_uniq Stdlib.compare lst in\n",
" match List.compare_lengths lst u with 0 -> lst | _ -> failwith \"RI\"\n",
"\n",
" let empty = []\n",
"\n",
" let mem x lst = List.mem x (rep_ok lst)\n",
"\n",
" let add x lst = rep_ok (if mem x (rep_ok lst) then lst else x :: lst)\n",
"\n",
" let rem x lst = rep_ok (List.filter (( <> ) x) (rep_ok lst))\n",
"\n",
" let size lst = List.length (rep_ok lst)\n",
"\n",
" let union lst1 lst2 =\n",
" rep_ok\n",
" (List.fold_left\n",
" (fun u x -> if mem x lst2 then u else x :: u)\n",
" (rep_ok lst2) (rep_ok lst1))\n",
"\n",
" let inter lst1 lst2 = rep_ok (List.filter (fun h -> mem h lst2) (rep_ok lst1))\n",
"end\n"
]
},
{
"cell_type": "markdown",
"id": "f75e9de5",
"metadata": {},
"source": [
"Calling `rep_ok` on every argument can be too expensive for the production\n",
"version of a program. The `rep_ok` above, for example, requires linearithmic\n",
"time, which destroys the efficiency of all the previously constant time or\n",
"linear time operations. For production code, it may be more appropriate to use a\n",
"version of `rep_ok` that only checks the parts of the rep invariant that are\n",
"cheap to check. When there is a requirement that there be no run-time cost,\n",
"`rep_ok` can be changed to an identity function (or macro) so the compiler\n",
"optimizes away the calls to it. However, it is a good idea to keep around the\n",
"full code of `rep_ok` so it can be easily reinstated during future debugging:\n",
"\n",
"```ocaml\n",
"let rep_ok lst = lst\n",
"\n",
"let rep_ok_expensive =\n",
" let u = List.sort_uniq Stdlib.compare lst in\n",
" match List.compare_lengths lst u with 0 -> lst | _ -> failwith \"RI\"\n",
"```\n",
"\n",
"Some languages provide support for *conditional compilation*, which provides\n",
"some kind of support for compiling some parts of the codebase but not others.\n",
"The OCaml compiler supports a flag `noassert` that disables assertion checking.\n",
"So you could implement rep invariant checking with `assert`, and turn it off\n",
"with `noassert`. The problem with that is that some portions of your codebase\n",
"might *require* assertion checking to be turned on to work correctly."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"formats": "md:myst",
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.13,
"jupytext_version": "1.10.3"
}
},
"kernelspec": {
"display_name": "OCaml",
"language": "OCaml",
"name": "ocaml-jupyter"
},
"language_info": {
"codemirror_mode": "text/x-ocaml",
"file_extension": ".ml",
"mimetype": "text/x-ocaml",
"name": "OCaml",
"nbconverter_exporter": null,
"pygments_lexer": "OCaml",
"version": "4.14.0"
},
"source_map": [
14,
49,
82,
92,
104,
124,
136,
401,
433
]
},
"nbformat": 4,
"nbformat_minor": 5
}