{
"cells": [
{
"cell_type": "markdown",
"id": "a86f9aef",
"metadata": {},
"source": [
"# Lists\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"x8oLIEtSRBs\")}}\n",
"\n",
"An OCaml list is a sequence of values all of which have the same type. They are\n",
"implemented as singly-linked lists. These lists enjoy a first-class status in\n",
"the language: there is special support for easily creating and working with\n",
"lists. That's a characteristic that OCaml shares with many other functional\n",
"languages. Mainstream imperative languages, like Python, have such support these\n",
"days too. Maybe that's because programmers find it so pleasant to work directly\n",
"with lists as a first-class part of the language, rather than having to go\n",
"through a library (as in C and Java).\n",
"\n",
"## Building Lists\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"I9u4kFPM7YI\")}}\n",
"\n",
"**Syntax.** There are three syntactic forms for building lists:\n",
"```ocaml\n",
"[]\n",
"e1 :: e2\n",
"[e1; e2; ...; en]\n",
"```\n",
"The empty list is written `[]` and is pronounced \"nil\", a name that comes from\n",
"Lisp. Given a list `lst` and element `elt`, we can prepend `elt` to `lst` by\n",
"writing `elt :: lst`. The double-colon operator is pronounced \"cons\", a name\n",
"that comes from an operator in Lisp that __cons__tructs objects in memory.\n",
"\"Cons\" can also be used as a verb, as in \"I will cons an element onto the list.\"\n",
"The first element of a list is usually called its *head* and the rest of the\n",
"elements (if any) are called its *tail*.\n",
"\n",
"The square bracket syntax is convenient but unnecessary. Any list\n",
"`[e1; e2; ...; en]` could instead be written with the more primitive nil and\n",
"cons syntax: `e1 :: e2 :: ... :: en :: []`. When a pleasant syntax can be\n",
"defined in terms of a more primitive syntax within the language, we call the\n",
"pleasant syntax *syntactic sugar*: it makes the language \"sweeter\". Transforming\n",
"the sweet syntax into the more primitive syntax is called *desugaring*.\n",
"\n",
"Because the elements of the list can be arbitrary expressions, lists can be\n",
"nested as deeply as we like, e.g., `[[[]]; [[1; 2; 3]]]`.\n",
"\n",
"**Dynamic semantics.**\n",
"\n",
"* `[]` is already a value.\n",
" * If `e1` evaluates to `v1`, and if `e2` evaluates to `v2`, then `e1 :: e2`\n",
" evaluates to `v1 :: v2`.\n",
"\n",
"As a consequence of those rules and how to desugar the square-bracket notation\n",
"for lists, we have the following derived rule:\n",
"\n",
" * If `ei` evaluates to `vi` for all `i` in `1..n`, then `[e1; ...; en]`\n",
" evaluates to `[v1; ...; vn]`.\n",
"\n",
"It's starting to get tedious to write \"evaluates to\" in all our evaluation\n",
"rules. So let's introduce a shorter notation for it. We'll write `e ==> v` to\n",
"mean that `e` evaluates to `v`. Note that `==>` is not a piece of OCaml syntax.\n",
"Rather, it's a notation we use in our description of the language, kind of like\n",
"metavariables. Using that notation, we can rewrite the latter two rules above:\n",
"\n",
" * If `e1 ==> v1`, and if `e2 ==> v2`, then `e1 :: e2 ==> v1 :: v2`.\n",
" * If `ei ==> vi` for all `i` in `1..n`, then `[e1; ...; en] ==> [v1; ...; vn]`.\n",
"\n",
"**Static semantics.**\n",
"\n",
"All the elements of a list must have the same type. If that element type is `t`,\n",
"then the type of the list is `t list`. You should read such types from right to\n",
"left: `t list` is a list of `t`'s, `t list list` is a list of list of `t`'s,\n",
"etc. The word `list` itself here is not a type: there is no way to build an\n",
"OCaml value that has type simply `list`. Rather, `list` is a *type constructor*:\n",
"given a type, it produces a new type. For example, given `int`, it produces the\n",
"type `int list`. You could think of type constructors as being like functions\n",
"that operate on types, instead of functions that operate on values.\n",
"\n",
"The type-checking rules:\n",
"\n",
"* `[] : 'a list`\n",
"* If `e1 : t` and `e2 : t list` then `e1 :: e2 : t list`. In case the colons\n",
" and their precedence is confusing, the latter means `(e1 :: e2) : t list`.\n",
"\n",
"In the rule for `[]`, recall that `'a` is a type variable: it stands for an\n",
"unknown type. So the empty list is a list whose elements have an unknown type.\n",
"If we cons an `int` onto it, say `2 :: []`, then the compiler infers that for\n",
"that particular list, `'a` must be `int`. But if in another place we cons a\n",
"`bool` onto it, say `true :: []`, then the compiler infers that for that\n",
"particular list, `'a` must be `bool`.\n",
"\n",
"## Accessing Lists\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"AkrlDpHN_zE\")}}\n",
"\n",
"```{note}\n",
"The video linked above also uses records and tuples as examples. Those are\n",
"covered in a [later section](records_tuples) of this book.\n",
"```\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"sO9wxUxajS4\")}}\n",
"\n",
"There are really only two ways to build a list, with nil and cons. So if we want\n",
"to take apart a list into its component pieces, we have to say what to do with\n",
"the list if it's empty, and what to do if it's non-empty (that is, a cons of one\n",
"element onto some other list). We do that with a language feature called\n",
"*pattern matching*.\n",
"\n",
"Here's an example of using pattern matching to compute the sum of a list:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c8900213",
"metadata": {},
"outputs": [],
"source": [
"let rec sum lst =\n",
" match lst with\n",
" | [] -> 0\n",
" | h :: t -> h + sum t"
]
},
{
"cell_type": "markdown",
"id": "54f0fe4f",
"metadata": {},
"source": [
"This function says to take the input `lst` and see whether it has the same shape\n",
"as the empty list. If so, return 0. Otherwise, if it has the same shape as the\n",
"list `h :: t`, then let `h` be the first element of `lst`, and let `t` be the\n",
"rest of the elements of `lst`, and return `h + sum t`. The choice of variable\n",
"names here is meant to suggest \"head\" and \"tail\" and is a common idiom, but we\n",
"could use other names if we wanted. Another common idiom is:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3379315e",
"metadata": {},
"outputs": [],
"source": [
"let rec sum xs =\n",
" match xs with\n",
" | [] -> 0\n",
" | x :: xs' -> x + sum xs'"
]
},
{
"cell_type": "markdown",
"id": "3d0c6c71",
"metadata": {},
"source": [
"That is, the input list is a list of xs (pronounced EX-uhs), the head element is\n",
"an x, and the tail is xs' (pronounced EX-uhs prime).\n",
"\n",
"Syntactically it isn't necessary to use so many lines to define `sum`. We could\n",
"do it all on one line:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7a91d1e3",
"metadata": {},
"outputs": [],
"source": [
"let rec sum xs = match xs with | [] -> 0 | x :: xs' -> x + sum xs'"
]
},
{
"cell_type": "markdown",
"id": "b12fa52b",
"metadata": {},
"source": [
"Or, noting that the first `|` after `with` is optional regardless of how many\n",
"lines we use, we could also write:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc73eeee",
"metadata": {},
"outputs": [],
"source": [
"let rec sum xs = match xs with [] -> 0 | x :: xs' -> x + sum xs'"
]
},
{
"cell_type": "markdown",
"id": "533785ce",
"metadata": {},
"source": [
"The multi-line format is what we'll usually use in this book, because it helps\n",
"the human eye understand the syntax a bit better. OCaml code formatting tools,\n",
"though, are moving toward the single-line format whenever the code is short\n",
"enough to fit on just one line.\n",
"\n",
"Here's another example of using pattern matching to compute the length of a\n",
"list:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "43fc9f54",
"metadata": {},
"outputs": [],
"source": [
"let rec length lst =\n",
" match lst with\n",
" | [] -> 0\n",
" | h :: t -> 1 + length t"
]
},
{
"cell_type": "markdown",
"id": "01fd82bd",
"metadata": {},
"source": [
"Note how we didn't actually need the variable `h` in the right-hand side of the\n",
"pattern match. When we want to indicate the presence of some value in a pattern\n",
"without actually giving it a name, we can write `_` (the underscore character):"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "820c2973",
"metadata": {},
"outputs": [],
"source": [
"let rec length lst =\n",
" match lst with\n",
" | [] -> 0\n",
" | _ :: t -> 1 + length t"
]
},
{
"cell_type": "markdown",
"id": "d5c4f7bf",
"metadata": {},
"source": [
"That function is actually built-in as part of the OCaml standard library `List`\n",
"module. Its name there is `List.length`. That \"dot\" notation indicates the\n",
"function named `length` inside the module named `List`, much like the dot\n",
"notation used in many other languages.\n",
"\n",
"And here's a third example that appends one list onto the beginning of\n",
"another list:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cbdf37ba",
"metadata": {},
"outputs": [],
"source": [
"let rec append lst1 lst2 =\n",
" match lst1 with\n",
" | [] -> lst2\n",
" | h :: t -> h :: append t lst2"
]
},
{
"cell_type": "markdown",
"id": "120ae1e4",
"metadata": {},
"source": [
"For example, `append [1; 2] [3; 4]` is `[1; 2; 3; 4]`. That function is actually\n",
"available as a built-in operator `@`, so we could instead write\n",
"`[1; 2] @ [3; 4]`.\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"VDRTatjSl0E\")}}\n",
"\n",
"As a final example, we could write a function to determine whether\n",
"a list is empty:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e748a80",
"metadata": {},
"outputs": [],
"source": [
"let empty lst =\n",
" match lst with\n",
" | [] -> true\n",
" | h :: t -> false"
]
},
{
"cell_type": "markdown",
"id": "0c00a9cf",
"metadata": {},
"source": [
"But there is a much better way to write the same function without pattern matching:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "edb61da8",
"metadata": {},
"outputs": [],
"source": [
"let empty lst =\n",
" lst = []"
]
},
{
"cell_type": "markdown",
"id": "9a82d921",
"metadata": {},
"source": [
"Note how all the recursive functions above are similar to doing proofs by\n",
"induction on the natural numbers: every natural number is either 0 or is 1\n",
"greater than some other natural number $n$, and so a proof by induction has a\n",
"base case for 0 and an inductive case for $n + 1$. Likewise, all our functions\n",
"have a base case for the empty list and a recursive case for the list that has\n",
"one more element than another list. This similarity is no accident. There is a\n",
"deep relationship between induction and recursion; we'll explore that\n",
"relationship in more detail later in the book.\n",
"\n",
"By the way, there are two library functions `List.hd` and `List.tl` that return\n",
"the head and tail of a list. It is not good, idiomatic OCaml to apply these\n",
"directly to a list. The problem is that they will raise an exception when\n",
"applied to the empty list, and you will have to remember to handle that\n",
"exception. Instead, you should use pattern matching: you'll then be forced to\n",
"match against both the empty list and the non-empty list (at least), which will\n",
"prevent exceptions from being raised, thus making your program more robust.\n",
"\n",
"## (Not) Mutating Lists\n",
"\n",
"Lists are immutable. There's no way to change an element of a list from one\n",
"value to another. Instead, OCaml programmers create new lists out of old lists.\n",
"For example, suppose we wanted to write a function that returned the same list\n",
"as its input list, but with the first element (if there is one) incremented by\n",
"one. We could do that:\n",
"```ocaml\n",
"let inc_first lst =\n",
" match lst with\n",
" | [] -> []\n",
" | h :: t -> h + 1 :: t\n",
"```\n",
"\n",
"Now you might be concerned about whether we're being wasteful of space. After\n",
"all, there are at least two ways the compiler could implement the above code:\n",
"\n",
"1. Copy the entire tail list `t` when the new list is created in the pattern\n",
" match with cons, such that the amount of memory in use just increased by an\n",
" amount proportionate to the length of `t`.\n",
"\n",
"2. Share the tail list `t` between the old list and the new list, such that the\n",
" amount of memory in use does not increase—beyond the one extra piece of\n",
" memory needed to store `h + 1`.\n",
"\n",
"In fact, the compiler does the latter. So there's no need for concern. The\n",
"reason that it's quite safe for the compiler to implement sharing is exactly\n",
"that list elements are immutable. If they were instead mutable, then we'd start\n",
"having to worry about whether the list I have is shared with the list you have,\n",
"and whether changes I make will be visible in your list. So immutability makes\n",
"it easier to reason about the code, and makes it safe for the compiler to\n",
"perform an optimization.\n",
"\n",
"## Pattern Matching with Lists\n",
"\n",
"We saw above how to access lists using pattern matching. Let's look more\n",
"carefully at this feature.\n",
"\n",
"**Syntax.**\n",
"```ocaml\n",
"match e with\n",
"| p1 -> e1\n",
"| p2 -> e2\n",
"| ...\n",
"| pn -> en\n",
"```\n",
"\n",
"Each of the clauses `pi -> ei` is called a *branch* or a *case* of the pattern\n",
"match. The first vertical bar in the entire pattern match is optional.\n",
"\n",
"The `p`'s here are a new syntactic form called a *pattern*. For now, a pattern\n",
"may be:\n",
"\n",
"* a variable name, e.g., `x`\n",
"* the underscore character `_`, which is called the *wildcard*\n",
"* the empty list `[]`\n",
"* `p1 :: p2`\n",
"* `[p1; ...; pn]`\n",
"\n",
"No variable name may appear more than once in a pattern. For example, the\n",
"pattern `x :: x` is illegal. The wildcard may occur any number of times.\n",
"\n",
"As we learn more of data structures available in OCaml, we'll expand\n",
"the possibilities for what a pattern may be.\n",
"\n",
"**Dynamic semantics.**\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"sz72NP4u4DQ\")}}\n",
"\n",
"Pattern matching involves two inter-related tasks: determining whether a pattern\n",
"matches a value, and determining what parts of the value should be associated\n",
"with which variable names in the pattern. The former task is intuitively about\n",
"determining whether a pattern and a value have the same *shape*. The latter task\n",
"is about determining the *variable bindings* introduced by the pattern. For\n",
"example, consider the following code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b93d4977",
"metadata": {},
"outputs": [],
"source": [
"match 1 :: [] with\n",
"| [] -> false\n",
"| h :: t -> h >= 1 && List.length t = 0"
]
},
{
"cell_type": "markdown",
"id": "5f41ad98",
"metadata": {},
"source": [
"When evaluating the right-hand side of the second branch, `h` is bound to `1`\n",
"and `t` is bound to `[]`. Let's write `h->1` to mean the variable binding saying\n",
"that `h` has value `1`; this is not a piece of OCaml syntax, but rather a\n",
"notation we use to reason about the language. So the variable bindings produced\n",
"by the second branch would be `h->1, t->[]`.\n",
"\n",
"Using that notation, here is a definition of when a pattern matches a value and\n",
"the bindings that match produces:\n",
"\n",
"* The pattern `x` matches any value `v` and produces the variable binding\n",
" `x->v`.\n",
"\n",
"* The pattern `_` matches any value and produces no bindings.\n",
"\n",
"* The pattern `[]` matches the value `[]` and produces no bindings.\n",
"\n",
"* If `p1` matches `v1` and produces a set $b_1$ of bindings, and if `p2` matches\n",
" `v2` and produces a set $b_2$ of bindings, then `p1 :: p2` matches `v1 :: v2`\n",
" and produces the set $b_1 \\cup b_2$ of bindings. Note that `v2` must be a list\n",
" (since it's on the right-hand side of `::`) and could have any length: 0\n",
" elements, 1 element, or many elements. Note that the union $b_1 \\cup b_2$ of\n",
" bindings will never have a problem where the same variable is bound separately\n",
" in both $b_1$ and $b_2$ because of the syntactic restriction that no variable\n",
" name may appear more than once in a pattern.\n",
"\n",
"* If for all `i` in `1..n`, it holds that `pi` matches `vi` and produces the set\n",
" $b_i$ of bindings, then `[p1; ...; pn]` matches `[v1; ...; vn]` and produces\n",
" the set $\\bigcup_i b_i$ of bindings. Note that this pattern specifies the\n",
" exact length the list must be.\n",
"\n",
"Now we can say how to evaluate `match e with p1 -> e1 | ... | pn -> en`:\n",
"\n",
"* Evaluate `e` to a value `v`.\n",
"\n",
"* Attempt to match `v` against `p1`, then against `p2`, and so on, in the order \n",
" they appear in the match expression.\n",
"\n",
"* If `v` does not match against any of the patterns, then evaluation of the\n",
" match expression raises a `Match_failure` exception. We haven't yet discussed\n",
" exceptions in OCaml, but you're surely familiar with them from other\n",
" languages. We'll come back to exceptions near the end of this chapter, after\n",
" we've covered some of the other built-in data structures in OCaml.\n",
"\n",
"* Otherwise, let `pi` be the first pattern that matches, and let $b$ be the \n",
" variable bindings produced by matching `v` against `pi`.\n",
"\n",
"* Substitute those bindings $b$ inside `ei`, producing a new expression `e'`.\n",
"\n",
"* Evaluate `e'` to a value `v'`.\n",
"\n",
"* The result of the entire match expression is `v'`.\n",
"\n",
"For example, here's how this match expression would be evaluated:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a8bf0925",
"metadata": {},
"outputs": [],
"source": [
"match 1 :: [] with\n",
"| [] -> false\n",
"| h :: t -> h = 1 && t = []"
]
},
{
"cell_type": "markdown",
"id": "354e3dc0",
"metadata": {},
"source": [
"* `1 :: []` is already a value.\n",
"\n",
"* `[]` does not match ``1 :: []``.\n",
"\n",
"* `h :: t` does match `1 :: []` and produces variable bindings\n",
" {`h->1`,`t->[]`}, because:\n",
"\n",
" - `h` matches `1` and produces the variable binding `h->1`.\n",
"\n",
" - `t` matches `[]` and produces the variable binding `t->[]`.\n",
"\n",
"* Substituting {`h->1`,`t->[]`} inside `h = 1 && t = []`\n",
" produces a new expression `1 = 1 && [] = []`.\n",
"\n",
"* Evaluating `1 = 1 && [] = []` yields the value `true`. We omit the\n",
" justification for that fact here, but it follows from other evaluation rules\n",
" for built-in operators and function application.\n",
"\n",
"* So the result of the entire match expression is `true`.\n",
"\n",
"**Static semantics.**\n",
"\n",
"* If `e : ta` and for all `i`, it holds that `pi : ta` and `ei : tb`,\n",
" then `(match e with p1 -> e1 | ... | pn -> en) : tb`.\n",
"\n",
"That rule relies on being able to judge whether a pattern has a particular type.\n",
"As usual, type inference comes into play here. The OCaml compiler infers the\n",
"types of any pattern variables as well as all occurrences of the wildcard\n",
"pattern. As for the list patterns, they have the same type-checking rules as\n",
"list expressions.\n",
"\n",
"**Additional Static Checking.**\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"aLQJpk9vXD4\")}}\n",
"\n",
"In addition to that type-checking rule, there are two other checks the compiler\n",
"does for each match expression.\n",
"\n",
"First, **exhaustiveness:** the compiler checks to make sure that there are\n",
"enough patterns to guarantee that at least one of them matches the expression\n",
"`e`, no matter what the value of that expression is at run time. This ensures\n",
"that the programmer did not forget any branches. For example, the function below\n",
"will cause the compiler to emit a warning:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a84a9a7f",
"metadata": {},
"outputs": [],
"source": [
"let head lst = match lst with h :: _ -> h"
]
},
{
"cell_type": "markdown",
"id": "40c941ea",
"metadata": {},
"source": [
"By presenting that warning to the programmer, the compiler is helping the\n",
"programmer to defend against the possibility of `Match_failure` exceptions at\n",
"runtime.\n",
"\n",
"```{note}\n",
"Sorry about how the output from the cell above gets split into many lines in the\n",
"HTML. That is currently an [open issue with JupyterBook][issue], the framework\n",
"used to build this book.\n",
"\n",
"[issue]: https://github.com/executablebooks/jupyter-book/issues/973\n",
"```\n",
"\n",
"Second, **unused branches:** the compiler checks to see whether any of the\n",
"branches could never be matched against because one of the previous branches is\n",
"guaranteed to succeed. For example, the function below will cause the compiler\n",
"to emit a warning:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5f429033",
"metadata": {},
"outputs": [],
"source": [
"let rec sum lst =\n",
" match lst with\n",
" | h :: t -> h + sum t\n",
" | [ h ] -> h\n",
" | [] -> 0"
]
},
{
"cell_type": "markdown",
"id": "db2cde3b",
"metadata": {},
"source": [
"The second branch is unused because the first branch will match anything the\n",
"second branch matches.\n",
"\n",
"Unused match cases are usually a sign that the programmer wrote something other\n",
"than what they intended. So by presenting that warning, the compiler is helping\n",
"the programmer to detect latent bugs in their code.\n",
"\n",
"Here's an example of one of the most common bugs that causes an unused match\n",
"case warning. Understanding it is also a good way to check your understanding of\n",
"the dynamic semantics of match expressions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cfcdf8c1",
"metadata": {},
"outputs": [],
"source": [
"let length_is lst n =\n",
" match List.length lst with\n",
" | n -> true\n",
" | _ -> false"
]
},
{
"cell_type": "markdown",
"id": "4a83ddc7",
"metadata": {},
"source": [
"The programmer was thinking that if the length of `lst` is equal to `n`, then\n",
"this function will return `true`, and otherwise will return `false`. But in fact\n",
"this function *always* returns `true`. Why? Because the pattern variable `n` is\n",
"distinct from the function argument `n`. Suppose that the length of `lst` is 5.\n",
"Then the pattern match becomes: `match 5 with n -> true | _ -> false`. Does `n`\n",
"match 5? Yes, according to the rules above: a variable pattern matches any value\n",
"and here produces the binding `n->5`. Then evaluation applies that binding to\n",
"`true`, substituting all occurrences of `n` inside of `true` with 5. Well, there\n",
"are no such occurrences. So we're done, and the result of evaluation is just\n",
"`true`.\n",
"\n",
"What the programmer really meant to write was:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ccad4e4a",
"metadata": {},
"outputs": [],
"source": [
"let length_is lst n =\n",
" match List.length lst with\n",
" | m -> m = n"
]
},
{
"cell_type": "markdown",
"id": "92758512",
"metadata": {},
"source": [
"or better yet:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ab4f61f3",
"metadata": {},
"outputs": [],
"source": [
"let length_is lst n =\n",
" List.length lst = n"
]
},
{
"cell_type": "markdown",
"id": "b36220e8",
"metadata": {},
"source": [
"## Deep Pattern Matching\n",
"\n",
"Patterns can be nested. Doing so can allow your code to look deeply into the\n",
"structure of a list. For example:\n",
"\n",
"* `_ :: []` matches all lists with exactly one element\n",
"\n",
"* `_ :: _` matches all lists with at least one element\n",
"\n",
"* `_ :: _ :: []` matches all lists with exactly two elements\n",
"\n",
"* `_ :: _ :: _ :: _` matches all lists with at least three elements\n",
"\n",
"## Immediate Matches\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"VgVP8Tin6yY\")}}\n",
"\n",
"When you have a function that immediately pattern-matches against its final\n",
"argument, there's a nice piece of syntactic sugar you can use to avoid writing\n",
"extra code. Here's an example: instead of"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d0ff1551",
"metadata": {},
"outputs": [],
"source": [
"let rec sum lst =\n",
" match lst with\n",
" | [] -> 0\n",
" | h :: t -> h + sum t"
]
},
{
"cell_type": "markdown",
"id": "cab7dbea",
"metadata": {},
"source": [
"you can write"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "14d6265d",
"metadata": {},
"outputs": [],
"source": [
"let rec sum = function\n",
" | [] -> 0\n",
" | h :: t -> h + sum t"
]
},
{
"cell_type": "markdown",
"id": "63ac44b1",
"metadata": {},
"source": [
"The word `function` is a keyword. Notice that we're able to leave out the line\n",
"containing `match` as well as the name of the argument, which was never used\n",
"anywhere else but that line. In such cases, though, it's especially important in\n",
"the specification comment for the function to document what that argument is\n",
"supposed to be, since the code no longer gives it a descriptive name.\n",
"\n",
"## OCamldoc and List Syntax\n",
"\n",
"OCamldoc is a documentation generator similar to Javadoc. It extracts comments\n",
"from source code and produces HTML (as well as other output formats). The\n",
"[standard library web documentation][std-web] for the List module is generated\n",
"by OCamldoc from the [standard library source code][std-src] for that module,\n",
"for example.\n",
"\n",
"```{warning}\n",
"There is a syntactic convention with square brackets in OCamldoc that can be\n",
"confusing with respect to lists.\n",
"\n",
"In an OCamldoc comment, source code is surrounded by square brackets. That code\n",
"will be rendered in typewriter face and syntax-highlighted in the output HTML.\n",
"The square brackets in this case do not indicate a list.\n",
"```\n",
"\n",
"For example, here is the comment for `List.hd` in the standard library source\n",
"code:\n",
"```ocaml\n",
"(** Return the first element of the given list. Raise\n",
" [Failure \"hd\"] if the list is empty. *)\n",
"```\n",
"The `[Failure \"hd\"]` does not mean a list containing the exception\n",
"`Failure \"hd\"`. Rather it means to typeset the expression `Failure \"hd\"` as\n",
"source code, as you can see [here][std-web].\n",
"\n",
"This can get especially confusing when you want to talk about lists as part of\n",
"the documentation. For example, here is a way we could rewrite that comment:\n",
"```ocaml\n",
"(** [hd lst] returns the first element of [lst].\n",
" Raises [Failure \"hd\"] if [lst = []]. *)\n",
"```\n",
"In `[lst = []]`, the outer square brackets indicate source code as part of a\n",
"comment, whereas the inner square brackets indicate the empty list.\n",
"\n",
"[std-web]: https://ocaml.org/api/List.html\n",
"[std-src]: https://github.com/ocaml/ocaml/blob/trunk/stdlib/list.mli\n",
"\n",
"## List Comprehensions\n",
"\n",
"Some languages, including Python and Haskell, have a syntax called\n",
"*comprehension* that allows lists to be written somewhat like set comprehensions\n",
"from mathematics. The earliest example of comprehensions seems to be the\n",
"functional language NPL, which was designed in 1977.\n",
"\n",
"OCaml doesn't have built-in syntactic support for comprehensions. Though some\n",
"extensions were developed, none seem to be supported any longer. The primary\n",
"tasks accomplished by comprehensions (filtering out some elements, and\n",
"transforming others) are actually well-supported already by *higher-order\n",
"programming*, which we'll study in a later chapter, and the pipeline operator,\n",
"which we've already learned. So an additional syntax for comprehensions was\n",
"never really needed.\n",
"\n",
"## Tail Recursion\n",
"\n",
"Recall that a function is *tail recursive* if it calls itself recursively but\n",
"does not perform any computation after the recursive call returns, and\n",
"immediately returns to its caller the value of its recursive call. Consider\n",
"these two implementations, `sum` and `sum_tr` of summing a list:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "940943ae",
"metadata": {},
"outputs": [],
"source": [
"let rec sum (l : int list) : int =\n",
" match l with\n",
" | [] -> 0\n",
" | x :: xs -> x + (sum xs)\n",
"\n",
"let rec sum_plus_acc (acc : int) (l : int list) : int =\n",
" match l with\n",
" | [] -> acc\n",
" | x :: xs -> sum_plus_acc (acc + x) xs\n",
"\n",
"let sum_tr : int list -> int =\n",
" sum_plus_acc 0"
]
},
{
"cell_type": "markdown",
"id": "413b01ed",
"metadata": {},
"source": [
"Observe the following difference between the `sum` and `sum_tr` functions above:\n",
"In the `sum` function, which is not tail recursive, after the recursive call\n",
"returned its value, we add `x` to it. In the tail recursive `sum_tr`, or rather\n",
"in `sum_plus_acc`, after the recursive call returns, we immediately return the\n",
"value without further computation.\n",
"\n",
"If you're going to write functions on really long lists, tail recursion becomes\n",
"important for performance. So when you have a choice between using a\n",
"tail-recursive vs. non-tail-recursive function, you are likely better off using\n",
"the tail-recursive function on really long lists to achieve space efficiency.\n",
"For that reason, the List module documents which functions are tail recursive\n",
"and which are not.\n",
"\n",
"But that doesn't mean that a tail-recursive implementation is strictly better.\n",
"For example, the tail-recursive function might be harder to read. (Consider\n",
"`sum_plus_acc`.) Also, there are cases where implementing a tail-recursive\n",
"function entails having to do a pre- or post-processing pass to reverse the\n",
"list. On small- to medium-sized lists, the overhead of reversing the list (both\n",
"in time and in allocating memory for the reversed list) can make the\n",
"tail-recursive version less time efficient. What constitutes \"small\" vs. \"big\"\n",
"here? That's hard to say, but maybe 10,000 is a good estimate, according to the\n",
"[standard library documentation of the `List` module][list].\n",
"\n",
"[list]: https://ocaml.org/api/List.html\n",
"\n",
"Here is a useful tail-recursive function to produce a long list:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dafa47ed",
"metadata": {},
"outputs": [],
"source": [
"(** [from i j l] is the list containing the integers from [i] to [j],\n",
" inclusive, followed by the list [l].\n",
" Example: [from 1 3 [0] = [1; 2; 3; 0]] *)\n",
"let rec from i j l = if i > j then l else from i (j - 1) (j :: l)\n",
"\n",
"(** [i -- j] is the list containing the integers from [i] to [j], inclusive. *)\n",
"let ( -- ) i j = from i j []\n",
"\n",
"let long_list = 0 -- 1_000_000"
]
},
{
"cell_type": "markdown",
"id": "86b07e76",
"metadata": {},
"source": [
"It would be worthwhile to study the definition of `--` to convince yourself that\n",
"you understand (i) how it works and (ii) why it is tail recursive.\n",
"\n",
"You might in the future decide you want to create such a list again. Rather than\n",
"having to remember where this definition is, and having to copy it into your\n",
"code, here's an easy way to create the same list using a built-in library\n",
"function:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2836bb8f",
"metadata": {},
"outputs": [],
"source": [
"List.init 1_000_000 Fun.id"
]
},
{
"cell_type": "markdown",
"id": "b9f142c4",
"metadata": {},
"source": [
"Expression `List.init len f` creates the list `[f 0; f 1; ...; f (len - 1)]`,\n",
"and it does so tail recursively if `len` is bigger than 10,000. Function\n",
"`Fun.id` is simply the identify function `fun x -> x`."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"formats": "md:myst",
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.13,
"jupytext_version": "1.10.3"
}
},
"kernelspec": {
"display_name": "OCaml",
"language": "OCaml",
"name": "ocaml-jupyter"
},
"source_map": [
14,
120,
125,
133,
138,
144,
146,
149,
151,
159,
164,
168,
173,
181,
186,
195,
200,
202,
205,
299,
303,
357,
361,
407,
409,
428,
434,
447,
452,
466,
470,
474,
477,
499,
504,
506,
510,
578,
591,
620,
630,
640,
642
]
},
"nbformat": 4,
"nbformat_minor": 5
}