{
"cells": [
{
"cell_type": "markdown",
"id": "e9b14326",
"metadata": {},
"source": [
"# Map\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"qz7kn2pIl3M\")}}\n",
"\n",
"Here are two functions we might want to write:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d6029a44",
"metadata": {},
"outputs": [],
"source": [
"(** [add1 lst] adds 1 to each element of [lst]. *)\n",
"let rec add1 = function\n",
" | [] -> []\n",
" | h :: t -> (h + 1) :: add1 t\n",
"\n",
"let lst1 = add1 [1; 2; 3]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f14b3bc1",
"metadata": {},
"outputs": [],
"source": [
"(** [concat_bang lst] concatenates \"!\" to each element of [lst]. *)\n",
"let rec concat_bang = function\n",
" | [] -> []\n",
" | h :: t -> (h ^ \"!\") :: concat_bang t\n",
"\n",
"let lst2 = concat_bang [\"sweet\"; \"salty\"]"
]
},
{
"cell_type": "markdown",
"id": "6b542a88",
"metadata": {},
"source": [
"There's a lot of similarity between those two functions:\n",
"\n",
"- They both pattern match against a list.\n",
"- They both return the same value for the base case of the empty list.\n",
"- They both recurse on the tail in the case of a non-empty list.\n",
"\n",
"In fact the only difference (other than their names) is what they do for the\n",
"head element: add versus concatenate. Let's rewrite the two functions to make\n",
"that difference even more explicit:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6b690bb5",
"metadata": {},
"outputs": [],
"source": [
"(** [add1 lst] adds 1 to each element of [lst]. *)\n",
"let rec add1 = function\n",
" | [] -> []\n",
" | h :: t ->\n",
" let f = fun x -> x + 1 in\n",
" f h :: add1 t\n",
"\n",
"(** [concat_bang lst] concatenates \"!\" to each element of [lst]. *)\n",
"let rec concat_bang = function\n",
" | [] -> []\n",
" | h :: t ->\n",
" let f = fun x -> x ^ \"!\" in\n",
" f h :: concat_bang t"
]
},
{
"cell_type": "markdown",
"id": "26b2a6f6",
"metadata": {},
"source": [
"Now the only difference between the two functions (again, other than their\n",
"names) is the body of helper function `f`. Why repeat all that code when there's\n",
"such a small difference between the functions? We might as well *abstract* that\n",
"one helper function out from each main function and make it an argument:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7f4801d9",
"metadata": {},
"outputs": [],
"source": [
"let rec add1' f = function\n",
" | [] -> []\n",
" | h :: t -> f h :: add1' f t\n",
"\n",
"(** [add1 lst] adds 1 to each element of [lst]. *)\n",
"let add1 = add1' (fun x -> x + 1)\n",
"\n",
"let rec concat_bang' f = function\n",
" | [] -> []\n",
" | h :: t -> f h :: concat_bang' f t\n",
"\n",
"(** [concat_bang lst] concatenates \"!\" to each element of [lst]. *)\n",
"let concat_bang = concat_bang' (fun x -> x ^ \"!\")"
]
},
{
"cell_type": "markdown",
"id": "c1c4205d",
"metadata": {},
"source": [
"But now there really is no difference at all between `add1'` and `concat_bang'`\n",
"except for their names. They are totally duplicated code. Even their types are\n",
"now the same, because nothing about them mentions integers or strings. We might\n",
"as well just keep only one of them and come up with a good new name for it. One\n",
"possibility could be `transform`, because they transform a list by applying a\n",
"function to each element of the list:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fa20724e",
"metadata": {},
"outputs": [],
"source": [
"let rec transform f = function\n",
" | [] -> []\n",
" | h :: t -> f h :: transform f t\n",
"\n",
"(** [add1 lst] adds 1 to each element of [lst]. *)\n",
"let add1 = transform (fun x -> x + 1)\n",
"\n",
"(** [concat_bang lst] concatenates \"!\" to each element of [lst]. *)\n",
"let concat_bang = transform (fun x -> x ^ \"!\")"
]
},
{
"cell_type": "markdown",
"id": "673c749c",
"metadata": {},
"source": [
"````{note}\n",
"Instead of\n",
"```ocaml\n",
"let add1 lst = transform (fun x -> x + 1) lst\n",
"```\n",
"above we wrote\n",
"```ocaml\n",
"let add1 = transform (fun x -> x + 1)\n",
"```\n",
"This is another way of being higher order, but it's one we already learned\n",
"about under the guise of partial application. The latter way of writing\n",
"the function partially applies `transform` to just one of its two\n",
"arguments, thus returning a function. That function is bound to the\n",
"name `add1`.\n",
"````\n",
"\n",
"Indeed, the C++ library does call the equivalent function `transform`. But OCaml\n",
"and many other languages (including Java and Python) use the shorter word *map*,\n",
"in the mathematical sense of how a function maps an input to an output. So let's\n",
"make one final change to that name:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9c900743",
"metadata": {},
"outputs": [],
"source": [
"let rec map f = function\n",
" | [] -> []\n",
" | h :: t -> f h :: map f t\n",
"\n",
"(** [add1 lst] adds 1 to each element of [lst]. *)\n",
"let add1 = map (fun x -> x + 1)\n",
"\n",
"(** [concat_bang lst] concatenates \"!\" to each element of [lst]. *)\n",
"let concat_bang = map (fun x -> x ^ \"!\")"
]
},
{
"cell_type": "markdown",
"id": "cd496135",
"metadata": {},
"source": [
"We have now successfully applied the Abstraction Principle: the common structure\n",
"has been factored out. What's left clearly expresses the computation, at least\n",
"to the reader who is familiar with `map`, in a way that the original versions do\n",
"not as quickly make apparent.\n",
"\n",
"{{ video_embed | replace(\"%%VID%%\", \"hynjCGMpcFk\")}}\n",
"\n",
"## Side Effects\n",
"\n",
"The `map` function exists already in OCaml's standard library as `List.map`, but\n",
"with one small difference from the implementation we discovered above. First,\n",
"let's see what's potentially wrong with our own implementation, then we'll look\n",
"at the standard library's implementation.\n",
"\n",
"We've seen before in our discussion of [exceptions](../data/exceptions) that the\n",
"OCaml language specification does not generally specify evaluation order of\n",
"subexpressions, and that the current language implementation generally evaluates\n",
"right-to-left. Because of that, the following (rather contrived) code actually\n",
"causes the list elements to be printed in what might seem like reverse order:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "54be1ad4",
"metadata": {},
"outputs": [],
"source": [
"let p x = print_int x; print_newline(); x + 1\n",
"\n",
"let lst = map p [1; 2]"
]
},
{
"cell_type": "markdown",
"id": "6477d009",
"metadata": {},
"source": [
"Here's why:\n",
"\n",
"- Expression `map p [1; 2]` evaluates to `p 1 :: map p [2]`.\n",
"- The right-hand side of that expression is then evaluated to\n",
" `p 1 :: (p 2 :: map p [])`. The application of `p` to `1` has not\n",
" yet occurred.\n",
"- The right-hand side of `::` is again evaluated next, yielding\n",
" `p 1 :: (p 2 :: [])`.\n",
"- Then `p` is applied to `2`, and finally to `1`.\n",
"\n",
"That is likely surprising to anyone who is predisposed to thinking that\n",
"evaluation would occur left-to-right. The solution is to use a `let`\n",
"expression to cause the evaluation of the function application to occur\n",
"before the recursive call:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "79b33f42",
"metadata": {},
"outputs": [],
"source": [
"let rec map f = function\n",
" | [] -> []\n",
" | h :: t -> let h' = f h in h' :: map f t\n",
"\n",
"let lst2 = map p [1; 2]"
]
},
{
"cell_type": "markdown",
"id": "832859ee",
"metadata": {},
"source": [
"Here's why that works:\n",
"\n",
"- Expression `map p [1; 2]` evaluates to `let h' = p 1 in h' :: map p [2]`.\n",
"- The binding expression `p 1` is evaluated, causing `1` to be printed\n",
" and `h'` to be bound to `2`.\n",
"- The body expression `h' :: map p [2]` is then evaluated, which\n",
" leads to `2` being printed next.\n",
"\n",
"So that's how the standard library defines `List.map`. We should use it instead\n",
"of re-defining the function ourselves from now on. But it's good that we have\n",
"discovered the function \"from scratch\" as it were, and that if needed we could\n",
"quickly re-code it.\n",
"\n",
"The bigger lesson to take away from this discussion is that when evaluation\n",
"order matters, we need to use `let` to ensure it. When does it matter? Only when\n",
"there are side effects. Printing and exceptions are the two we've seen so far.\n",
"Later we'll add mutability.\n",
"\n",
"## Map and Tail Recursion\n",
"\n",
"Astute readers will have noticed that the implementation of `map` is not tail\n",
"recursive. That is to some extent unavoidable. Here's a tempting but awful way\n",
"to create a tail-recursive version of it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2e824a2c",
"metadata": {},
"outputs": [],
"source": [
"let rec map_tr_aux f acc = function\n",
" | [] -> acc\n",
" | h :: t -> map_tr_aux f (acc @ [f h]) t\n",
"\n",
"let map_tr f = map_tr_aux f []\n",
"\n",
"let lst = map_tr (fun x -> x + 1) [1; 2; 3]"
]
},
{
"cell_type": "markdown",
"id": "87b6d035",
"metadata": {},
"source": [
"To some extent that works: the output is correct, and `map_tr_aux` is tail\n",
"recursive. The subtle flaw is the subexpression `acc @ [f h]`. Recall that\n",
"append is a linear-time operation on singly-linked lists. That is, if there are\n",
"$n$ list elements then append takes time $O(n)$. So at each recursive call we\n",
"perform a $O(n)$ operation. And there will be $n$ recursive calls, one for each\n",
"element of the list. That's a total of $n \\cdot O(n)$ work, which is $O(n^2)$.\n",
"So we achieved tail recursion, but at a high cost: what ought to be a\n",
"linear-time operation became quadratic time.\n",
"\n",
"In an attempt to fix that, we could use the constant-time cons operation instead\n",
"of the linear-time append operation:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3a47aa13",
"metadata": {},
"outputs": [],
"source": [
"let rec map_tr_aux f acc = function\n",
" | [] -> acc\n",
" | h :: t -> map_tr_aux f (f h :: acc) t\n",
"\n",
"let map_tr f = map_tr_aux f []\n",
"\n",
"let lst = map_tr (fun x -> x + 1) [1; 2; 3]"
]
},
{
"cell_type": "markdown",
"id": "cbac52e4",
"metadata": {},
"source": [
"And to some extent that works: it's tail recursive and linear time. The\n",
"not-so-subtle flaw this time is that the output is backwards. As we take each\n",
"element off the front of the input list, we put it on the front of the output\n",
"list, but that reverses their order.\n",
"\n",
"```{note}\n",
"To understand why the reversal occurs, it might help to think of the input and\n",
"output lists as people standing in a queue:\n",
"\n",
"- Input: Alice, Bob.\n",
"- Output: empty.\n",
"\n",
"Then we remove Alice from the input and add her to the output:\n",
"\n",
"- Input: Bob.\n",
"- Output: Alice.\n",
"\n",
"Then we remove Bob from the input and add him to the output:\n",
"\n",
"- Input: empty.\n",
"- Output: Bob, Alice.\n",
"\n",
"The point is that with singly-linked lists, we can only operate on the head of\n",
"the list and still be constant time. We can't move Bob to the back of the output\n",
"without making him walk past Alice—and anyone else who might be standing\n",
"in the output.\n",
"```\n",
"\n",
"For that reason, the standard library calls this function `List.rev_map`, that\n",
"is, a (tail-recursive) map function that returns its output in reverse order."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a5995a5d",
"metadata": {},
"outputs": [],
"source": [
"let rec rev_map_aux f acc = function\n",
" | [] -> acc\n",
" | h :: t -> rev_map_aux f (f h :: acc) t\n",
"\n",
"let rev_map f = rev_map_aux f []\n",
"\n",
"let lst = rev_map (fun x -> x + 1) [1; 2; 3]"
]
},
{
"cell_type": "markdown",
"id": "97a6a535",
"metadata": {},
"source": [
"If you want the output in the \"right\" order, that's easy: just apply `List.rev`\n",
"to it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "91176077",
"metadata": {},
"outputs": [],
"source": [
"let lst = List.rev (List.rev_map (fun x -> x + 1) [1; 2; 3])"
]
},
{
"cell_type": "markdown",
"id": "78e9a3f7",
"metadata": {},
"source": [
"Since `List.rev` is both linear time and tail recursive, that yields a complete\n",
"solution. We get a linear-time and tail-recursive map computation. The expense\n",
"is that it requires two passes through the list: one to transform, the other to\n",
"reverse. We're not going to do better than this efficiency with a singly-linked\n",
"list. Of course, there are other data structures that implement lists, and we'll\n",
"come to those eventually. Meanwhile, recall that we generally don't have to\n",
"worry about tail recursion (which is to say, about stack space) until lists have\n",
"10,000 or more elements.\n",
"\n",
"Why doesn't the standard library provide this all-in-one function? Maybe it will\n",
"someday if there's good enough reason. But you might discover in your own\n",
"programming there's not a lot of need for it. In many cases, we can either do\n",
"without the tail recursion, or be content with a reversed list.\n",
"\n",
"The bigger lesson to take away from this discussion is that there can be a\n",
"tradeoff between time and space efficiency for recursive functions. By\n",
"attempting to make a function more space efficient (i.e., tail recursive), we\n",
"can accidentally make it asymptotically less time efficient (i.e., quadratic\n",
"instead of linear), or if we're clever keep the asymptotic time efficiency the\n",
"same (i.e., linear) at the cost of a constant factor (i.e., processing twice).\n",
"\n",
"## Map in Other Languages\n",
"\n",
"We mentioned above that the idea of map exists in many programming languages.\n",
"Here's an example from Python:\n",
"```python\n",
">>> print(list(map(lambda x: x + 1, [1, 2, 3])))\n",
"[2, 3, 4]\n",
"```\n",
"We have to use the `list` function to convert the result of the `map` back to a\n",
"list, because Python for sake of efficiency produces each element of the `map`\n",
"output as needed. Here again we see the theme of \"when does it get evaluated?\"\n",
"returning.\n",
"\n",
"In Java, map is part of the `Stream` abstraction that was added in Java 8. Since\n",
"there isn't a built-in Java syntax for lists or streams, it's a little more\n",
"verbose to give an example. Here we use a factory method `Stream.of` to create a\n",
"stream:\n",
"```java\n",
"jshell> Stream.of(1, 2, 3).map(x -> x + 1).collect(Collectors.toList())\n",
"$1 ==> [2, 3, 4]\n",
"```\n",
"Like in the Python example, we have to use something to convert the stream back\n",
"into a list. In this case it's the `collect` method."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"formats": "md:myst",
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.13,
"jupytext_version": "1.10.3"
}
},
"kernelspec": {
"display_name": "OCaml",
"language": "OCaml",
"name": "ocaml-jupyter"
},
"source_map": [
14,
21,
30,
37,
49,
63,
70,
84,
93,
103,
126,
136,
158,
162,
179,
185,
211,
219,
233,
241,
274,
282,
287,
289
]
},
"nbformat": 4,
"nbformat_minor": 5
}