{
"cells": [
{
"cell_type": "markdown",
"id": "d662d601",
"metadata": {},
"source": [
"# Functional Data Structures\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"CLeXXZDkkCI\")}}\n",
"\n",
"A *functional data structure* is one that does not make use of mutability.\n",
"It's possible to build functional data structures both in functional languages\n",
"and in imperative languages. For example, you could build\n",
"a Java equivalent to OCaml's `list` type by creating a `Node`\n",
"class whose fields are immutable by virtue of using\n",
"the `const` keyword.\n",
"\n",
"Functional data structures have the property of being *persistent*: updating the\n",
"data structure with one of its operations does not change the existing version\n",
"of the data structure but instead produces a new version. Both exist and both\n",
"can still be accessed. A good language implementation will ensure that any parts\n",
"of the data structure that are not changed by an operation will be *shared*\n",
"between the old version and the new version. Any parts that do change will be\n",
"*copied* so that the old version may persist. The opposite of a persistent data\n",
"structure is an *ephemeral* data structure: changes are destructive, so that\n",
"only one version exists at any time. Both persistent and ephemeral data\n",
"structures can be built in both functional and imperative languages.\n",
"\n",
"## Lists\n",
"\n",
"The built-in singly-linked `list` data structure in OCaml is functional. We know\n",
"that, because we've seen how to implement it with algebraic data types. It's\n",
"also persistent, which we can demonstrate:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "954bbe51",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"val lst : int list = [1; 2]\n"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"val lst' : int list = [2]\n"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"- : int list = [1; 2]\n"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"let lst = [1; 2];;\n",
"let lst' = List.tl lst;;\n",
"lst;;"
]
},
{
"cell_type": "markdown",
"id": "707c8ddf",
"metadata": {},
"source": [
"Taking the tail of `lst` does not change the list. Both `lst` and `lst'`\n",
"coexist without affecting one another.\n",
"\n",
"## Stacks\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"LWmGzSCpvVY\")}}\n",
"\n",
"We implemented stacks earlier in this chapter. Here's a terse variant of one of\n",
"those implementations, in which we add a `to_list` operation to make it easier to\n",
"view the contents of the stack in examples:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5cd148d1",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"module type Stack =\n",
" sig\n",
" type 'a t\n",
" exception Empty\n",
" val empty : 'a t\n",
" val is_empty : 'a t -> bool\n",
" val push : 'a -> 'a t -> 'a t\n",
" val peek : 'a t -> 'a\n",
" val pop : 'a t -> 'a t\n",
" val size : 'a t -> int\n",
" val to_list : 'a t -> 'a list\n",
" end\n"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"module ListStack : Stack\n"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module type Stack = sig\n",
" type 'a t\n",
" exception Empty\n",
" val empty : 'a t\n",
" val is_empty : 'a t -> bool\n",
" val push : 'a -> 'a t -> 'a t\n",
" val peek : 'a t -> 'a\n",
" val pop : 'a t -> 'a t\n",
" val size : 'a t -> int\n",
" val to_list : 'a t -> 'a list\n",
"end\n",
"\n",
"module ListStack : Stack = struct\n",
" type 'a t = 'a list\n",
" exception Empty\n",
" let empty = []\n",
" let is_empty = function [] -> true | _ -> false\n",
" let push = List.cons\n",
" let peek = function [] -> raise Empty | x :: _ -> x\n",
" let pop = function [] -> raise Empty | _ :: s -> s\n",
" let size = List.length\n",
" let to_list = Fun.id\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "9e7b7653",
"metadata": {},
"source": [
"That implementation is functional, as can be seen above, and also persistent:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a7d8b8c2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"val s : int ListStack.t = \n"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"val s' : int ListStack.t = \n"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"- : int list = [2; 1]\n"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"- : int list = [1]\n"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"open ListStack;;\n",
"let s = empty |> push 1 |> push 2;;\n",
"let s' = pop s;;\n",
"to_list s;;\n",
"to_list s';;"
]
},
{
"cell_type": "markdown",
"id": "9db76458",
"metadata": {},
"source": [
"The value `s` is unchanged by the `pop` operation that creates `s'`. Both\n",
"versions of the stack coexist.\n",
"\n",
"The `Stack` module type gives us a strong hint that the data structure is\n",
"persistent in the types it provides for `push` and `pop`:\n",
"\n",
"```ocaml\n",
"val push : 'a -> 'a t -> 'a t\n",
"val pop : 'a t -> 'a t\n",
"```\n",
"\n",
"Both of those take a stack as an argument and return a new stack as a result. An\n",
"ephemeral data structure usually would not bother to return a stack. In Java,\n",
"for example, similar methods might have a `void` return type; the equivalent in\n",
"OCaml would be returning `unit`.\n",
"\n",
"## Options vs Exceptions\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"tbMU_pv0p9o\")}}\n",
"\n",
"All of our stack implementations so far have raised an exception whenever `peek`\n",
"or `pop` is applied to the empty stack. Another possibility would be to use an\n",
"`option` for the return value. If the input stack is empty, then `peek` and\n",
"`pop` return `None`; otherwise, they return `Some`."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d95d3784",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"module type Stack =\n",
" sig\n",
" type 'a t\n",
" val empty : 'a t\n",
" val is_empty : 'a t -> bool\n",
" val push : 'a -> 'a t -> 'a t\n",
" val peek : 'a t -> 'a option\n",
" val pop : 'a t -> 'a t option\n",
" val size : 'a t -> int\n",
" val to_list : 'a t -> 'a list\n",
" end\n"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"module ListStack : Stack\n"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module type Stack = sig\n",
" type 'a t\n",
" val empty : 'a t\n",
" val is_empty : 'a t -> bool\n",
" val push : 'a -> 'a t -> 'a t\n",
" val peek : 'a t -> 'a option\n",
" val pop : 'a t -> 'a t option\n",
" val size : 'a t -> int\n",
" val to_list : 'a t -> 'a list\n",
"end\n",
"\n",
"module ListStack : Stack = struct\n",
" type 'a t = 'a list\n",
" exception Empty\n",
" let empty = []\n",
" let is_empty = function [] -> true | _ -> false\n",
" let push = List.cons\n",
" let peek = function [] -> None | x :: _ -> Some x\n",
" let pop = function [] -> None | _ :: s -> Some s\n",
" let size = List.length\n",
" let to_list = Fun.id\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "413aaa6f",
"metadata": {},
"source": [
"But that makes it harder to pipeline:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4a8fb044",
"metadata": {
"tags": [
"raises-exception"
]
},
"outputs": [
{
"ename": "error",
"evalue": "compile_error",
"output_type": "error",
"traceback": [
"File \"[5]\", line 1, characters 11-33:\n1 | ListStack.(empty |> push 1 |> pop |> peek)\n ^^^^^^^^^^^^^^^^^^^^^^\nError: This expression has type int ListStack.t option\n but an expression was expected of type 'a ListStack.t\n"
]
}
],
"source": [
"ListStack.(empty |> push 1 |> pop |> peek)"
]
},
{
"cell_type": "markdown",
"id": "35ac3761",
"metadata": {},
"source": [
"The types break down for the pipeline right after the `pop`, because that\n",
"now returns an `'a t option`, but `peek` expects an input that is merely\n",
"an `'a t`.\n",
"\n",
"It is possible to define some additional operators to help restore the ability\n",
"to pipeline. In fact, these functions are already defined in the `Option` module\n",
"in the standard library, though not as infix operators:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6765c072",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"val ( >>| ) : 'a option -> ('a -> 'b) -> 'b option = \n"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"val ( >>= ) : 'a option -> ('a -> 'b option) -> 'b option = \n"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(* Option.map aka fmap *)\n",
"let ( >>| ) opt f =\n",
" match opt with\n",
" | None -> None\n",
" | Some x -> Some (f x)\n",
"\n",
"(* Option.bind *)\n",
"let ( >>= ) opt f =\n",
" match opt with\n",
" | None -> None\n",
" | Some x -> f x"
]
},
{
"cell_type": "markdown",
"id": "959180cd",
"metadata": {},
"source": [
"We can use those as needed for pipelining:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "e3d9bbda",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"- : int list option = Some [3]\n"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ListStack.(empty |> push 1 |> pop >>| push 2 >>= pop >>| push 3 >>| to_list)"
]
},
{
"cell_type": "markdown",
"id": "6d84e04c",
"metadata": {},
"source": [
"But it's not so pleasant to figure out which of the three operators to use\n",
"where.\n",
"\n",
"There is therefore a tradeoff in the interface design:\n",
"\n",
"* Using options ensures that surprising exceptions regarding empty stacks never\n",
" occur at run-time. The program is therefore more robust. But the convenient\n",
" pipeline operator is lost.\n",
"\n",
"* Using exceptions means that programmers don't have to write as much code. If\n",
" they are sure that an exception can't occur, they can omit the code for\n",
" handling it. The program is less robust, but writing it is more convenient.\n",
"\n",
"There is thus a tradeoff between writing more code early (with options) or doing\n",
"more debugging later (with exceptions). The OCaml standard library has recently\n",
"begun providing both versions of the interface in a data structure, so that the\n",
"client can make the choice of how they want to use it. For example, we could\n",
"provide both `peek` and `peek_opt`, and the same for `pop`, for clients of\n",
"our stack module:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "8b9645c4",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"module type Stack =\n",
" sig\n",
" type 'a t\n",
" val empty : 'a t\n",
" val is_empty : 'a t -> bool\n",
" val push : 'a -> 'a t -> 'a t\n",
" val peek : 'a t -> 'a\n",
" val peek_opt : 'a t -> 'a option\n",
" val pop : 'a t -> 'a t\n",
" val pop_opt : 'a t -> 'a t option\n",
" val size : 'a t -> int\n",
" val to_list : 'a t -> 'a list\n",
" end\n"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"module ListStack : Stack\n"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module type Stack = sig\n",
" type 'a t\n",
" val empty : 'a t\n",
" val is_empty : 'a t -> bool\n",
" val push : 'a -> 'a t -> 'a t\n",
" val peek : 'a t -> 'a\n",
" val peek_opt : 'a t -> 'a option\n",
" val pop : 'a t -> 'a t\n",
" val pop_opt : 'a t -> 'a t option\n",
" val size : 'a t -> int\n",
" val to_list : 'a t -> 'a list\n",
"end\n",
"\n",
"module ListStack : Stack = struct\n",
" type 'a t = 'a list\n",
" exception Empty\n",
" let empty = []\n",
" let is_empty = function [] -> true | _ -> false\n",
" let push = List.cons\n",
" let peek = function [] -> raise Empty | x :: _ -> x\n",
" let peek_opt = function [] -> None | x :: _ -> Some x\n",
" let pop = function [] -> raise Empty | _ :: s -> s\n",
" let pop_opt = function [] -> None | _ :: s -> Some s\n",
" let size = List.length\n",
" let to_list = Fun.id\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "a847a258",
"metadata": {},
"source": [
"One nice thing about this implementation is that it is efficient. All the\n",
"operations except for `size` are constant time. We saw earlier in the chapter\n",
"that `size` could be made constant time as well, at the cost of some extra space\n",
"— though just a constant factor more — by caching the size of the\n",
"stack at each node in the list.\n",
"\n",
"## Queues\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"rCiBfZO67A4\")}}\n",
"\n",
"Queues and stacks are fairly similar interfaces. We'll stick with exceptions\n",
"instead of options for now."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "306f358e",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"module type Queue =\n",
" sig\n",
" type 'a t\n",
" exception Empty\n",
" val empty : 'a t\n",
" val is_empty : 'a t -> bool\n",
" val enqueue : 'a -> 'a t -> 'a t\n",
" val front : 'a t -> 'a\n",
" val dequeue : 'a t -> 'a t\n",
" val size : 'a t -> int\n",
" val to_list : 'a t -> 'a list\n",
" end\n"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module type Queue = sig\n",
" (** An ['a t] is a queue whose elements have type ['a]. *)\n",
" type 'a t\n",
"\n",
" (** Raised if [front] or [dequeue] is applied to the empty queue. *)\n",
" exception Empty\n",
"\n",
" (** [empty] is the empty queue. *)\n",
" val empty : 'a t\n",
"\n",
" (** [is_empty q] is whether [q] is empty. *)\n",
" val is_empty : 'a t -> bool\n",
"\n",
" (** [enqueue x q] is the queue [q] with [x] added to the end. *)\n",
" val enqueue : 'a -> 'a t -> 'a t\n",
"\n",
" (** [front q] is the element at the front of the queue. Raises [Empty]\n",
" if [q] is empty. *)\n",
" val front : 'a t -> 'a\n",
"\n",
" (** [dequeue q] is the queue containing all the elements of [q] except the\n",
" front of [q]. Raises [Empty] is [q] is empty. *)\n",
" val dequeue : 'a t -> 'a t\n",
"\n",
" (** [size q] is the number of elements in [q]. *)\n",
" val size : 'a t -> int\n",
"\n",
" (** [to_list q] is a list containing the elements of [q] in order from\n",
" front to back. *)\n",
" val to_list : 'a t -> 'a list\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "31ca2f13",
"metadata": {},
"source": [
"```{important}\n",
"Similarly to `peek` and `pop`, note how `front` and `dequeue` divide the\n",
"responsibility of getting the first element vs. getting all the rest of the\n",
"elements.\n",
"```\n",
"\n",
"It's easy to implement queues with lists, just as it was for implementing\n",
"stacks:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "84784ef3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"module ListQueue : Queue\n"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module ListQueue : Queue = struct\n",
" (** The list [x1; x2; ...; xn] represents the queue with [x1] at its front,\n",
" followed by [x2], ..., followed by [xn]. *)\n",
" type 'a t = 'a list\n",
" exception Empty\n",
" let empty = []\n",
" let is_empty = function [] -> true | _ -> false\n",
" let enqueue x q = q @ [x]\n",
" let front = function [] -> raise Empty | x :: _ -> x\n",
" let dequeue = function [] -> raise Empty | _ :: q -> q\n",
" let size = List.length\n",
" let to_list = Fun.id\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "232616f1",
"metadata": {},
"source": [
"But despite being as easy, this implementation is not as efficient as our\n",
"list-based stacks. Dequeueing is a constant-time operation with this\n",
"representation, but enqueueing is a linear-time operation. That's because\n",
"`dequeue` does a single pattern match, whereas `enqueue` must traverse the\n",
"entire list to append the new element at the end.\n",
"\n",
"There's a very clever way to do better on efficiency. We can use two lists to\n",
"represent a single queue. This representation was invented by Robert Melville as\n",
"part of his PhD dissertation at Cornell (*Asymptotic Complexity of Iterative\n",
"Computations*, Jan 1981), which was advised by Prof. David Gries. Chris Okasaki\n",
"(*Purely Functional Data Structures*, Cambridge University Press, 1988) calls\n",
"these *batched queues*. Sometimes you will see this same implementation referred\n",
"to as \"implementing a queue with two stacks\". That's because stacks and lists\n",
"are so similar (as we've already seen) that you could rewrite `pop` as\n",
"`List.tl`, and so forth.\n",
"\n",
"The core idea has a Part A and a Part B. Part A is: we use the two lists to\n",
"split the queue into two pieces, the *inbox* and *outbox*. When new elements are\n",
"enqueued, we put them in the inbox. Eventually (we'll soon come to how) elements\n",
"are transferred from the inbox to the outbox. When a dequeue is requested, that\n",
"element is removed from the outbox; or when the front element is requested, we\n",
"check the outbox for it. For example, if the inbox currently had `[3; 4; 5]` and\n",
"the outbox had `[1; 2]`, then the front element would be `1`, which is the head\n",
"of the outbox. Dequeuing would remove that element and leave the inbox with just\n",
"`[2]`, which is the tail of the outbox. Likewise, enqueuing `6` would make the\n",
"inbox become `[3; 4; 5; 6]`.\n",
"\n",
"The efficiency of `front` and `dequeue` is very good so far. We just have to\n",
"take the head or tail of the outbox, respectively, assuming it is non-empty.\n",
"Those are constant-time operations. But the efficiency of `enqueue` is still\n",
"bad. It's linear time, because we have to append the new element to the end of\n",
"the list. It's too bad we have to use the append operator, which is inherently\n",
"linear time. It would be much better if we could use cons, which is constant\n",
"time.\n",
"\n",
"So here's Part B of the core idea: let's keep the inbox in reverse order. For\n",
"example, if we enqueued `3` then `4` then `5`, the inbox would actually\n",
"be `[5; 4; 3]`, not `[3; 4; 5]`. Then if `6` were enqueued next, we could\n",
"cons it onto the beginning of the inbox, which becomes `[6; 5; 4; 3]`. The\n",
"queue represented by inbox `i` and outbox `o` is therefore `o @ List.rev i`.\n",
"So `enqueue` can now always be a constant-time operation.\n",
"\n",
"But what about `dequeue` (and `front`)? They're constant time too, **as long as\n",
"the outbox is not empty.** If it's empty, we have a problem. We need to transfer\n",
"whatever is in the inbox to the outbox at that point. For example, if the\n",
"outbox is empty, and the inbox is `[6; 5; 4; 3]`, then we need to switch them\n",
"around, making the outbox be `[3; 4; 5; 6]` and the inbox be empty. That's\n",
"actually easy: we just have to reverse the list.\n",
"\n",
"Unfortunately, we just re-introduced a linear-time operation. But with one\n",
"crucial difference: we don't have to do that linear-time reverse on every\n",
"`dequeue`, whereas with `ListQueue` above we had to do the linear-time append on\n",
"every `enqueue`. Instead, we only have to do the reverse on those rare occasions\n",
"when the outbox becomes empty.\n",
"\n",
"So even though in the worst case `dequeue` (and `front`) will be linear time,\n",
"most of the time they will not be. In fact, later in this book when we study\n",
"*amortized analysis* we will show that in the long run they can be understood as\n",
"constant-time operations. For now, here's a piece of intuition to support that\n",
"claim: every individual element enters the inbox once (with a cons), moves to\n",
"the outbox once (with a pattern match then cons), and leaves the outbox once (with\n",
"a pattern match). Each of those is constant time. So each element only ever\n",
"experiences constant-time operations from its own perspective.\n",
"\n",
"For now, let's move on to implementing these ideas. In the implementation, we'll\n",
"add one more idea: the outbox always has to have an element in it, unless the\n",
"queue is empty. In other words, if the outbox is empty, we're guaranteed the\n",
"inbox is too. That requirement isn't necessary for batched queues, but it does\n",
"keep the code simpler by reducing the number of times we have to check whether a\n",
"list is empty. The tiny tradeoff is that if the queue is empty, `enqueue` now\n",
"has to directly put an element into the outbox. No matter, that's still a\n",
"constant-time operation."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "1a44c0e7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"module BatchedQueue : Queue\n"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module BatchedQueue : Queue = struct\n",
" (** [{o; i}] represents the queue [o @ List.rev i]. For example,\n",
" [{o = [1; 2]; i = [5; 4; 3]}] represents the queue [1, 2, 3, 4, 5],\n",
" where [1] is the front element. To avoid ambiguity about emptiness,\n",
" whenever only one of the lists is empty, it must be [i]. For example,\n",
" [{o = [1]; i = []}] is a legal representation, but [{o = []; i = [1]}]\n",
" is not. This implies that if [o] is empty, [i] must also be empty. *)\n",
" type 'a t = {o : 'a list; i : 'a list}\n",
"\n",
" exception Empty\n",
"\n",
" let empty = {o = []; i = []}\n",
"\n",
" let is_empty = function\n",
" | {o = []} -> true\n",
" | _ -> false\n",
"\n",
" let enqueue x = function\n",
" | {o = []} -> {o = [x]; i = []}\n",
" | {o; i} -> {o; i = x :: i}\n",
"\n",
" let front = function\n",
" | {o = []} -> raise Empty\n",
" | {o = h :: _} -> h\n",
"\n",
" let dequeue = function\n",
" | {o = []} -> raise Empty\n",
" | {o = [_]; i} -> {o = List.rev i; i = []}\n",
" | {o = _ :: t; i} -> {o = t; i}\n",
"\n",
" let size {o; i} = List.(length o + length i)\n",
"\n",
" let to_list {o; i} = o @ List.rev i\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "712e8b1d",
"metadata": {},
"source": [
"The efficiency of batched queues comes at a price in readability. If we compare\n",
"`ListQueue` and `BatchedQueue`, it's hopefully clear that `ListQueue` is a\n",
"simple and correct implementation of a queue data structure. It's probably far\n",
"less clear that `BatchedQueue` is a correct implementation. Just look at how\n",
"many paragraphs of writing it took to explain it above!\n",
"\n",
"## Maps\n",
"\n",
"Recall that a *map* (aka *dictionary*) binds keys to values. Here is a module\n",
"type for maps. There are many other operations a map might support, but\n",
"these will suffice for now."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "12e70300",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"module type Map =\n",
" sig\n",
" type ('k, 'v) t\n",
" val empty : ('k, 'v) t\n",
" val insert : 'k -> 'v -> ('k, 'v) t -> ('k, 'v) t\n",
" val lookup : 'k -> ('k, 'v) t -> 'v\n",
" val bindings : ('k, 'v) t -> ('k * 'v) list\n",
" end\n"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module type Map = sig\n",
" (** [('k, 'v) t] is the type of maps that bind keys of type ['k] to\n",
" values of type ['v]. *)\n",
" type ('k, 'v) t\n",
"\n",
" (** [empty] does not bind any keys. *)\n",
" val empty : ('k, 'v) t\n",
"\n",
" (** [insert k v m] is the map that binds [k] to [v], and also contains\n",
" all the bindings of [m]. If [k] was already bound in [m], that old\n",
" binding is superseded by the binding to [v] in the returned map. *)\n",
" val insert : 'k -> 'v -> ('k, 'v) t -> ('k, 'v) t\n",
"\n",
" (** [lookup k m] is the value bound to [k] in [m]. Raises: [Not_found] if [k]\n",
" is not bound in [m]. *)\n",
" val lookup : 'k -> ('k, 'v) t -> 'v\n",
"\n",
" (** [bindings m] is an association list containing the same bindings as [m].\n",
" The keys in the list are guaranteed to be unique. *)\n",
" val bindings : ('k, 'v) t -> ('k * 'v) list\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "b7b90d9d",
"metadata": {},
"source": [
"Note how `Map.t` is parameterized on two types, `'k` and `'v`, which are written\n",
"in parentheses and separated by commas. Although `('k, 'v)` might look like a\n",
"pair of values, it is not: it is a syntax for writing multiple type variables.\n",
"\n",
"Recall that association lists are lists of pairs, where the first element of\n",
"each pair is a key, and the second element is the value it binds. For example,\n",
"here is an association list that maps some well-known names to an approximation\n",
"of their numeric value:\n",
"\n",
"```\n",
"[(\"pi\", 3.14); (\"e\", 2.718); (\"phi\", 1.618)]\n",
"```\n",
"\n",
"Naturally we can implement the `Map` module type with association lists:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "e138f143",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"module AssocListMap : Map\n"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module AssocListMap : Map = struct\n",
" (** The list [(k1, v1); ...; (kn, vn)] binds key [ki] to value [vi].\n",
" If a key appears more than once in the list, it is bound to the\n",
" the left-most occurrence in the list. *)\n",
" type ('k, 'v) t = ('k * 'v) list\n",
" let empty = []\n",
" let insert k v m = (k, v) :: m\n",
" let lookup k m = List.assoc k m\n",
" let keys m = List.(m |> map fst |> sort_uniq Stdlib.compare)\n",
" let bindings m = m |> keys |> List.map (fun k -> (k, lookup k m))\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "78f2391a",
"metadata": {},
"source": [
"This implementation of maps is persistent. For example, adding a new\n",
"binding to the map `m` below does not change `m` itself:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "0bf4c420",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"val m : (string, float) AssocListMap.t = \n"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"val m' : (string, float) AssocListMap.t = \n"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"val b : (string * float) list = [(\"e\", 2.718); (\"pi\", 3.14)]\n"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"val b' : (string * float) list = [(\"e\", 2.718); (\"phi\", 1.618); (\"pi\", 3.14)]\n"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"open AssocListMap\n",
"let m = empty |> insert \"pi\" 3.14 |> insert \"e\" 2.718\n",
"let m' = m |> insert \"phi\" 1.618\n",
"let b = bindings m\n",
"let b' = bindings m'"
]
},
{
"cell_type": "markdown",
"id": "2b2cb2d4",
"metadata": {},
"source": [
"The `insert` operation is constant time, which is great. But the `lookup`\n",
"operation is linear time. It's possible to do much better than that. In a later\n",
"chapter, we'll see how to do better. Logarithmic-time performance is achievable\n",
"with balanced binary trees, and something like constant-time performance with\n",
"hash tables. Neither of those, however, achieves the simplicity of the code\n",
"above.\n",
"\n",
"The `bindings` operation is complicated by potential duplicate keys in the list.\n",
"It uses a `keys` helper function to extract the unique list of keys with the\n",
"help of library function `List.sort_uniq`. That function sorts an input list and\n",
"in the process discards duplicates. It requires a comparison function as input.\n",
"\n",
"```{note}\n",
"A comparison function must return 0 if its arguments compare as equal, a\n",
"positive integer if the first is greater, and a negative integer if the first is\n",
"smaller.\n",
"```\n",
"\n",
"Here we use the standard library's comparison function `Stdlib.compare`, which\n",
"behaves essentially the same as the built-in comparison operators `=`, `<`, `>`,\n",
"etc. Custom comparison functions are useful if you want to have a relaxed notion\n",
"of what being a duplicate means. For example, maybe you'd like to ignore the\n",
"case of strings, or the sign of a number, etc.\n",
"\n",
"The running time of `List.sort_uniq` is linearithmic, and it produces a linear\n",
"number of keys as output. For each of those keys, we do a linear-time lookup\n",
"operation. So the total running time of `bindings` is $O(n \\log n) + O(n) \\cdot\n",
"O(n)$, which is $O(n^2)$. We can definitely do better than that with\n",
"more advanced data structures.\n",
"\n",
"Actually we can have a constant-time `bindings` operation even with association\n",
"lists, if we are willing to pay for a linear-time `insert` operation:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "4fe9ab9a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"module UniqAssocListMap : Map\n"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module UniqAssocListMap : Map = struct\n",
" (** The list [(k1, v1); ...; (kn, vn)] binds key [ki] to value [vi].\n",
" No duplicate keys may occur. *)\n",
" type ('k, 'v) t = ('k * 'v) list\n",
" let empty = []\n",
" let insert k v m = (k, v) :: List.remove_assoc k m\n",
" let lookup k m = List.assoc k m\n",
" let bindings m = m\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "f348a528",
"metadata": {},
"source": [
"That implementation removes any duplicate binding of `k` before inserting\n",
"a new binding.\n",
"\n",
"## Sets\n",
"\n",
"Here is a module type for sets. There are many other operations a set data\n",
"structure might be expected to support, but these will suffice for now."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "21ce2ee8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"module type Set =\n",
" sig\n",
" type 'a t\n",
" val empty : 'a t\n",
" val mem : 'a -> 'a t -> bool\n",
" val add : 'a -> 'a t -> 'a t\n",
" val elements : 'a t -> 'a list\n",
" end\n"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module type Set = sig\n",
" (** ['a t] is the type of sets whose elements are of type ['a]. *)\n",
" type 'a t\n",
"\n",
" (** [empty] is the empty set *)\n",
" val empty : 'a t\n",
"\n",
" (** [mem x s] is whether [x] is an element of [s]. *)\n",
" val mem : 'a -> 'a t -> bool\n",
"\n",
" (** [add x s] is the set that contains [x] and all the elements of [s]. *)\n",
" val add : 'a -> 'a t -> 'a t\n",
"\n",
" (** [elements s] is a list containing the elements of [s]. No guarantee\n",
" is made about the ordering of that list, but each is guaranteed to\n",
" be unique. *)\n",
" val elements : 'a t -> 'a list\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "dc322a40",
"metadata": {},
"source": [
"Here's an implementation of that interface using a list to represent the set.\n",
"This implementation ensures that the list never contains any duplicate elements,\n",
"since sets themselves do not:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "599e6f8c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"module UniqListSet : Set\n"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module UniqListSet : Set = struct\n",
" type 'a t = 'a list\n",
" let empty = []\n",
" let mem = List.mem\n",
" let add x s = if mem x s then s else x :: s\n",
" let elements = Fun.id\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "b8408c5b",
"metadata": {},
"source": [
"Note how `add` ensures that the representation never contains any duplicates, so\n",
"the implementation of `elements` is easy. Of course, that comes with the\n",
"tradeoff of `add` being linear time.\n",
"\n",
"Here's a second implementation, which permits duplicates in the list:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "b6585668",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"module ListSet : Set\n"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"module ListSet : Set = struct\n",
" type 'a t = 'a list\n",
" let empty = []\n",
" let mem = List.mem\n",
" let add = List.cons\n",
" let elements s = List.sort_uniq Stdlib.compare s\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "3da8f9c5",
"metadata": {},
"source": [
"In that implementation, the `add` operation is now constant time, and the\n",
"`elements` operation is linearithmic time."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"formats": "md:myst",
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.13,
"jupytext_version": "1.10.3"
}
},
"kernelspec": {
"display_name": "OCaml",
"language": "OCaml",
"name": "ocaml-jupyter"
},
"language_info": {
"codemirror_mode": "text/x-ocaml",
"file_extension": ".ml",
"mimetype": "text/x-ocaml",
"name": "OCaml",
"nbconverter_exporter": null,
"pygments_lexer": "OCaml",
"version": "4.14.0"
},
"source_map": [
14,
44,
48,
61,
86,
90,
96,
123,
147,
151,
154,
164,
176,
180,
182,
204,
232,
247,
280,
291,
305,
380,
415,
429,
451,
468,
480,
485,
491,
526,
536,
546,
565,
571,
579,
586,
594
]
},
"nbformat": 4,
"nbformat_minor": 5
}