# Algebraic Data Types

## Contents

# 3.9. Algebraic Data Types#

Thus far, we have seen variants simply as enumerating a set of constant values, such as:

```
type day = Sun | Mon | Tue | Wed | Thu | Fri | Sat
type ptype = TNormal | TFire | TWater
type peff = ENormal | ENotVery | Esuper
```

But variants are far more powerful than this.

## 3.9.1. Variants that Carry Data#

As a running example, here is a variant type `shape`

that does more than just
enumerate values:

```
type point = float * float
type shape =
| Point of point
| Circle of point * float (* center and radius *)
| Rect of point * point (* lower-left and upper-right corners *)
```

```
type point = float * float
```

```
type shape = Point of point | Circle of point * float | Rect of point * point
```

This type, `shape`

, represents a shape that is either a point, a circle, or a
rectangle. A point is represented by a constructor `Point`

that *carries* some
additional data, which is a value of type `point`

. A circle is represented by a
constructor `Circle`

that carries two pieces of data: one of type `point`

and
the other of type `float`

. Those data represent the center of the circle and its
radius. A rectangle is represented by a constructor `Rect`

that carries two more
points.

Here are a couple functions that use the `shape`

type:

```
let area = function
| Point _ -> 0.0
| Circle (_, r) -> Float.pi *. (r ** 2.0)
| Rect ((x1, y1), (x2, y2)) ->
let w = x2 -. x1 in
let h = y2 -. y1 in
w *. h
let center = function
| Point p -> p
| Circle (p, _) -> p
| Rect ((x1, y1), (x2, y2)) -> ((x2 +. x1) /. 2.0, (y2 +. y1) /. 2.0)
```

```
val area : shape -> float = <fun>
```

```
val center : shape -> point = <fun>
```

The `shape`

variant type is the same as those we’ve seen before in that it is
defined in terms of a collection of constructors. What’s different than before
is that those constructors carry additional data along with them. Every value of
type `shape`

is formed from exactly one of those constructors. Sometimes we call
the constructor a *tag*, because it tags the data it carries as being from that
particular constructor.

Variant types are sometimes called *tagged unions*. Every value of the type is
from the set of values that is the union of all values from the underlying types
that the constructor carries. For example, with the `shape`

type, every value is
tagged with either `Point`

or `Circle`

or `Rect`

and carries a value from:

the set of all

`point`

values, unioned withthe set of all

`point * float`

values, unioned withthe set of all

`point * point`

values.

Another name for these variant types is an *algebraic data type*. “Algebra” here
refers to the fact that variant types contain both sum and product types, as
defined in the previous lecture. The sum types come from the fact that a value
of a variant is formed by *one of* the constructors. The product types come from
that fact that a constructor can carry tuples or records, whose values have a
sub-value from *each of* their component types.

Using variants, we can express a type that represents the union of several other
types, but in a type-safe way. Here, for example, is a type that represents
either a `string`

or an `int`

:

```
type string_or_int =
| String of string
| Int of int
```

```
type string_or_int = String of string | Int of int
```

If we wanted to, we could use this type to code up lists (e.g.) that contain either strings or ints:

```
type string_or_int_list = string_or_int list
let rec sum : string_or_int list -> int = function
| [] -> 0
| String s :: t -> int_of_string s + sum t
| Int i :: t -> i + sum t
let lst_sum = sum [String "1"; Int 2]
```

```
type string_or_int_list = string_or_int list
```

```
val sum : string_or_int list -> int = <fun>
```

```
val lst_sum : int = 3
```

Variants thus provide a type-safe way of doing something that might before have seemed impossible.

Variants also make it possible to discriminate which tag a value was constructed with, even if multiple constructors carry the same type. For example:

```
type t = Left of int | Right of int
let x = Left 1
let double_right = function
| Left i -> i
| Right i -> 2 * i
```

```
type t = Left of int | Right of int
```

```
val x : t = Left 1
```

```
val double_right : t -> int = <fun>
```

## 3.9.2. Syntax and Semantics#

**Syntax.**

To define a variant type:

```
type t = C1 [of t1] | ... | Cn [of tn]
```

The square brackets above denote that `of ti`

is optional. Every constructor may
individually either carry no data or carry data. We call constructors that carry
no data *constant*; and those that carry data, *non-constant*.

To write an expression that is a variant:

```
C e
```

Or:

```
C
```

depending on whether the constructor name `C`

is non-constant or constant.

**Dynamic semantics.**

If

`e==>v`

then`C e ==> C v`

, assuming`C`

is non-constant.`C`

is already a value, assuming`C`

is constant.

**Static semantics.**

If

`t = ... | C | ...`

then`C : t`

.If

`t = ... | C of t' | ...`

and if`e : t'`

then`C e : t`

.

**Pattern matching.**

We add the following new pattern form to the list of legal patterns:

`C p`

And we extend the definition of when a pattern matches a value and produces a binding as follows:

If

`p`

matches`v`

and produces bindings \(b\), then`C p`

matches`C v`

and produces bindings \(b\).

## 3.9.3. Catch-all Cases#

One thing to beware of when pattern matching against variants is what *Real
World OCaml* calls “catch-all cases”. Here’s a simple example of what can go
wrong. Let’s suppose you write this variant and function:

```
type color = Blue | Red
(* a thousand lines of code in between *)
let string_of_color = function
| Blue -> "blue"
| _ -> "red"
```

```
type color = Blue | Red
```

```
val string_of_color : color -> string = <fun>
```

Seems fine, right? But then one day you realize there are more colors in the world. You need to represent green. So you go back and add green to your variant:

```
type color = Blue | Red | Green
(* a thousand lines of code in between *)
let string_of_color = function
| Blue -> "blue"
| _ -> "red"
```

```
type color = Blue | Red | Green
```

```
val string_of_color : color -> string = <fun>
```

But because of the thousand lines of code in between, you forget that
`string_of_color`

needs updating. And now, all the sudden, you are
red-green color blind:

```
string_of_color Green
```

```
- : string = "red"
```

The problem is the *catch-all* case in the pattern match inside
`string_of_color`

: the final case that uses the wildcard pattern to match
anything. Such code is not robust against future changes to the variant type.

If, instead, you had originally coded the function as follows, life would be better:

```
let string_of_color = function
| Blue -> "blue"
| Red -> "red"
```

```
File "[9]", lines 1-3, characters 22-17:
```

```
1 | ......................function
```

```
2 | | Blue -> "blue"
```

```
3 | | Red -> "red"
```

```
Warning 8 [partial-match]: this pattern-matching is not exhaustive.
```

```
Here is an example of a case that is not matched:
```

```
Green
```

```
val string_of_color : color -> string = <fun>
```

The OCaml type checker now alerts you that you haven’t yet updated
`string_of_color`

to account for the new constructor.

The moral of the story is: catch-all cases lead to buggy code. Avoid using them.

## 3.9.4. Recursive Variants#

Variant types may mention their own name inside their own body. For example,
here is a variant type that could be used to represent something similar to
`int list`

:

```
type intlist = Nil | Cons of int * intlist
let lst3 = Cons (3, Nil) (* similar to 3 :: [] or [3]*)
let lst123 = Cons(1, Cons(2, lst3)) (* similar to [1; 2; 3] *)
let rec sum (l : intlist) : int=
match l with
| Nil -> 0
| Cons (h, t) -> h + sum t
let rec length : intlist -> int = function
| Nil -> 0
| Cons (_, t) -> 1 + length t
let empty : intlist -> bool = function
| Nil -> true
| Cons _ -> false
```

```
type intlist = Nil | Cons of int * intlist
```

```
val lst3 : intlist = Cons (3, Nil)
```

```
val lst123 : intlist = Cons (1, Cons (2, Cons (3, Nil)))
```

```
val sum : intlist -> int = <fun>
```

```
val length : intlist -> int = <fun>
```

```
val empty : intlist -> bool = <fun>
```

Notice that in the definition of `intlist`

, we define the `Cons`

constructor to
carry a value that contains an `intlist`

. This makes the type `intlist`

be
*recursive*: it is defined in terms of itself.

Types may be mutually recursive if you use the `and`

keyword:

```
type node = {value : int; next : mylist}
and mylist = Nil | Node of node
```

```
type node = { value : int; next : mylist; }
and mylist = Nil | Node of node
```

Any such mutual recursion must involve at least one variant or record type that the recursion “goes through”. For example, the following is not allowed:

```
type t = u and u = t
```

```
File "[12]", line 1, characters 0-10:
1 | type t = u and u = t
^^^^^^^^^^
Error: The definition of t contains a cycle:
u
```

But this is:

```
type t = U of u and u = T of t
```

```
type t = U of u
and u = T of t
```

Record types may also be recursive:

```
type node = {value : int; next : node}
```

```
type node = { value : int; next : node; }
```

But plain old type synonyms may not be:

```
type t = t * t
```

```
File "[15]", line 1, characters 0-14:
1 | type t = t * t
^^^^^^^^^^^^^^
Error: The type abbreviation t is cyclic
```

Although `node`

is a legal type definition, there is no way to construct a value
of that type because of the circularity involved: to construct the very first
`node`

value in existence, you would already need a value of type `node`

to
exist. Later, when we cover imperative features, we’ll see a similar idea used
(but successfully) for mutable linked lists.

## 3.9.5. Parameterized Variants#

Variant types may be *parameterized* on other types. For example,
the `intlist`

type above could be generalized to provide lists (coded
up ourselves) over any type:

```
type 'a mylist = Nil | Cons of 'a * 'a mylist
let lst3 = Cons (3, Nil) (* similar to [3] *)
let lst_hi = Cons ("hi", Nil) (* similar to ["hi"] *)
```

```
type 'a mylist = Nil | Cons of 'a * 'a mylist
```

```
val lst3 : int mylist = Cons (3, Nil)
```

```
val lst_hi : string mylist = Cons ("hi", Nil)
```

Here, `mylist`

is a *type constructor* but not a type: there is no way to write
a value of type `mylist`

. But we can write value of type `int mylist`

(e.g.,
`lst3`

) and `string mylist`

(e.g., `lst_hi`

). Think of a type constructor as
being like a function, but one that maps types to types, rather than values to
value.

Here are some functions over `'a mylist`

:

```
let rec length : 'a mylist -> int = function
| Nil -> 0
| Cons (_, t) -> 1 + length t
let empty : 'a mylist -> bool = function
| Nil -> true
| Cons _ -> false
```

```
val length : 'a mylist -> int = <fun>
```

```
val empty : 'a mylist -> bool = <fun>
```

Notice that the body of each function is unchanged from its previous definition
for `intlist`

. All that we changed was the type annotation. And that could even
be omitted safely:

```
let rec length = function
| Nil -> 0
| Cons (_, t) -> 1 + length t
let empty = function
| Nil -> true
| Cons _ -> false
```

```
val length : 'a mylist -> int = <fun>
```

```
val empty : 'a mylist -> bool = <fun>
```

The functions we just wrote are an example of a language feature called
**parametric polymorphism**. The functions don’t care what the `'a`

is in
`'a mylist`

, hence they are perfectly happy to work on `int mylist`

or
`string mylist`

or any other `(whatever) mylist`

. The word “polymorphism” is
based on the Greek roots “poly” (many) and “morph” (form). A value of type
`'a mylist`

could have many forms, depending on the actual type `'a`

.

As soon, though, as you place a constraint on what the type `'a`

might be, you
give up some polymorphism. For example,

```
let rec sum = function
| Nil -> 0
| Cons (h, t) -> h + sum t
```

```
val sum : int mylist -> int = <fun>
```

The fact that we use the `( + )`

operator with the head of the list constrains
that head element to be an `int`

, hence all elements must be `int`

. That means
`sum`

must take in an `int mylist`

, not any other kind of `'a mylist`

.

It is also possible to have multiple type parameters for a parameterized type, in which case parentheses are needed:

```
type ('a, 'b) pair = {first : 'a; second : 'b}
let x = {first = 2; second = "hello"}
```

```
type ('a, 'b) pair = { first : 'a; second : 'b; }
```

```
val x : (int, string) pair = {first = 2; second = "hello"}
```

## 3.9.6. Polymorphic Variants#

Thus far, whenever you’ve wanted to define a variant type, you have had to give
it a name, such as `day`

, `shape`

, or `'a mylist`

:

```
type day = Sun | Mon | Tue | Wed | Thu | Fri | Sat
type shape =
| Point of point
| Circle of point * float
| Rect of point * point
type 'a mylist = Nil | Cons of 'a * 'a mylist
```

```
type day = Sun | Mon | Tue | Wed | Thu | Fri | Sat
```

```
type shape = Point of point | Circle of point * float | Rect of point * point
```

```
type 'a mylist = Nil | Cons of 'a * 'a mylist
```

Occasionally, you might need a variant type only for the return value of a
single function. For example, here’s a function `f`

that can either return an
`int`

or \(\infty\); you are forced to define a variant type to represent that
result:

```
type fin_or_inf = Finite of int | Infinity
let f = function
| 0 -> Infinity
| 1 -> Finite 1
| n -> Finite (-n)
```

```
type fin_or_inf = Finite of int | Infinity
```

```
val f : int -> fin_or_inf = <fun>
```

The downside of this definition is that you were forced to defined
`fin_or_inf`

even though it won’t be used throughout much of your program.

There’s another kind of variant in OCaml that supports this kind of programming:
*polymorphic variants*. Polymorphic variants are just like variants, except:

You don’t have declare their type or constructors before using them.

There is no name for a polymorphic variant type. (So another name for this feature could have been “anonymous variants”.)

The constructors of a polymorphic variant start with a backquote character.

Using polymorphic variants, we can rewrite `f`

:

```
let f = function
| 0 -> `Infinity
| 1 -> `Finite 1
| n -> `Finite (-n)
```

```
val f : int -> [> `Finite of int | `Infinity ] = <fun>
```

This type says that `f`

either returns ` `Finite n`

for some `n : int`

or
` `Infinity`

. The square brackets do not denote a list, but rather a set of
possible constructors. The `>`

sign means that any code that pattern matches
against a value of that type must *at least* handle the constructors
` `Finite`

and ` `Infinity`

, and possibly more. For example, we could write:

```
match f 3 with
| `NegInfinity -> "negative infinity"
| `Finite n -> "finite"
| `Infinity -> "infinite"
```

```
- : string = "finite"
```

It’s perfectly fine for the pattern match to include constructors other than
` `Finite`

or ` `Infinity`

, because `f`

is guaranteed never to return any
constructors other than those.

There are other, more compelling uses for polymorphic variants that we’ll see later in the course. They are particularly useful in libraries. For now, we generally will steer you away from extensive use of polymorphic variants, because their types can become difficult to manage.

## 3.9.7. Built-in Variants#

OCaml’s built-in list data type is really a recursive, parameterized variant. It is defined as follows:

```
type 'a list = [] | ( :: ) of 'a * 'a list
```

So `list`

is really just a type constructor, with (value) constructors
`[]`

(which we pronounce “nil”) and `::`

(which we pronounce “cons”).

OCaml’s built-in option data type is also really a parameterized variant. It’s defined as follows:

```
type 'a option = None | Some of 'a
```

So `option`

is really just a type constructor, with (value) constructors
`None`

and `Some`

.

You can see both `list`

and `option`

defined in the core OCaml library.