3.9. Algebraic Data Types#

Thus far, we have seen variants simply as enumerating a set of constant values, such as:

type day = Sun | Mon | Tue | Wed | Thu | Fri | Sat

type ptype = TNormal | TFire | TWater

type peff = ENormal | ENotVery | Esuper

But variants are far more powerful than this.

3.9.1. Variants that Carry Data#

As a running example, here is a variant type shape that does more than just enumerate values:

type point = float * float
type shape =
  | Point of point
  | Circle of point * float (* center and radius *)
  | Rect of point * point (* lower-left and upper-right corners *)
type point = float * float
type shape = Point of point | Circle of point * float | Rect of point * point

This type, shape, represents a shape that is either a point, a circle, or a rectangle. A point is represented by a constructor Point that carries some additional data, which is a value of type point. A circle is represented by a constructor Circle that carries two pieces of data: one of type point and the other of type float. Those data represent the center of the circle and its radius. A rectangle is represented by a constructor Rect that carries two more points.

Here are a couple functions that use the shape type:

let area = function
  | Point _ -> 0.0
  | Circle (_, r) -> Float.pi *. (r ** 2.0)
  | Rect ((x1, y1), (x2, y2)) ->
      let w = x2 -. x1 in
      let h = y2 -. y1 in
      w *. h

let center = function
  | Point p -> p
  | Circle (p, _) -> p
  | Rect ((x1, y1), (x2, y2)) -> ((x2 +. x1) /. 2.0, (y2 +. y1) /. 2.0)
val area : shape -> float = <fun>
val center : shape -> point = <fun>

The shape variant type is the same as those we’ve seen before in that it is defined in terms of a collection of constructors. What’s different than before is that those constructors carry additional data along with them. Every value of type shape is formed from exactly one of those constructors. Sometimes we call the constructor a tag, because it tags the data it carries as being from that particular constructor.

Variant types are sometimes called tagged unions. Every value of the type is from the set of values that is the union of all values from the underlying types that the constructor carries. For example, with the shape type, every value is tagged with either Point or Circle or Rect and carries a value from:

  • the set of all point values, unioned with

  • the set of all point * float values, unioned with

  • the set of all point * point values.

Another name for these variant types is an algebraic data type. “Algebra” here refers to the fact that variant types contain both sum and product types, as defined in the previous lecture. The sum types come from the fact that a value of a variant is formed by one of the constructors. The product types come from that fact that a constructor can carry tuples or records, whose values have a sub-value from each of their component types.

Using variants, we can express a type that represents the union of several other types, but in a type-safe way. Here, for example, is a type that represents either a string or an int:

type string_or_int =
  | String of string
  | Int of int
type string_or_int = String of string | Int of int

If we wanted to, we could use this type to code up lists (e.g.) that contain either strings or ints:

type string_or_int_list = string_or_int list

let rec sum : string_or_int list -> int = function
  | [] -> 0
  | String s :: t -> int_of_string s + sum t
  | Int i :: t -> i + sum t

let lst_sum = sum [String "1"; Int 2]
type string_or_int_list = string_or_int list
val sum : string_or_int list -> int = <fun>
val lst_sum : int = 3

Variants thus provide a type-safe way of doing something that might before have seemed impossible.

Variants also make it possible to discriminate which tag a value was constructed with, even if multiple constructors carry the same type. For example:

type t = Left of int | Right of int
let x = Left 1
let double_right = function
  | Left i -> i
  | Right i -> 2 * i
type t = Left of int | Right of int
val x : t = Left 1
val double_right : t -> int = <fun>

3.9.2. Syntax and Semantics#

Syntax.

To define a variant type:

type t = C1 [of t1] | ... | Cn [of tn]

The square brackets above denote that of ti is optional. Every constructor may individually either carry no data or carry data. We call constructors that carry no data constant; and those that carry data, non-constant.

To write an expression that is a variant:

C e

Or:

C

depending on whether the constructor name C is non-constant or constant.

Dynamic semantics.

  • If e ==> v then C e ==> C v, assuming C is non-constant.

  • C is already a value, assuming C is constant.

Static semantics.

  • If t = ... | C | ... then C : t.

  • If t = ... | C of t' | ... and if e : t' then C e : t.

Pattern matching.

We add the following new pattern form to the list of legal patterns:

  • C p

And we extend the definition of when a pattern matches a value and produces a binding as follows:

  • If p matches v and produces bindings \(b\), then C p matches C v and produces bindings \(b\).

3.9.3. Catch-all Cases#

One thing to beware of when pattern matching against variants is what Real World OCaml calls “catch-all cases”. Here’s a simple example of what can go wrong. Let’s suppose you write this variant and function:

type color = Blue | Red

(* a thousand lines of code in between *)

let string_of_color = function
  | Blue -> "blue"
  | _ -> "red"
type color = Blue | Red
val string_of_color : color -> string = <fun>

Seems fine, right? But then one day you realize there are more colors in the world. You need to represent green. So you go back and add green to your variant:

type color = Blue | Red | Green

(* a thousand lines of code in between *)

let string_of_color = function
  | Blue -> "blue"
  | _ -> "red"
type color = Blue | Red | Green
val string_of_color : color -> string = <fun>

But because of the thousand lines of code in between, you forget that string_of_color needs updating. And now, all the sudden, you are red-green color blind:

string_of_color Green
- : string = "red"

The problem is the catch-all case in the pattern match inside string_of_color: the final case that uses the wildcard pattern to match anything. Such code is not robust against future changes to the variant type.

If, instead, you had originally coded the function as follows, life would be better:

let string_of_color = function
  | Blue -> "blue"
  | Red  -> "red"
File "[9]", lines 1-3, characters 22-17:
1 | ......................function
2 |   | Blue -> "blue"
3 |   | Red  -> "red"
Warning 8 [partial-match]: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
Green
val string_of_color : color -> string = <fun>

The OCaml type checker now alerts you that you haven’t yet updated string_of_color to account for the new constructor.

The moral of the story is: catch-all cases lead to buggy code. Avoid using them.

3.9.4. Recursive Variants#

Variant types may mention their own name inside their own body. For example, here is a variant type that could be used to represent something similar to int list:

type intlist = Nil | Cons of int * intlist

let lst3 = Cons (3, Nil)  (* similar to 3 :: [] or [3] *)
let lst123 = Cons(1, Cons(2, lst3)) (* similar to [1; 2; 3] *)

let rec sum (l : intlist) : int =
  match l with
  | Nil -> 0
  | Cons (h, t) -> h + sum t

let rec length : intlist -> int = function
  | Nil -> 0
  | Cons (_, t) -> 1 + length t

let empty : intlist -> bool = function
  | Nil -> true
  | Cons _ -> false
type intlist = Nil | Cons of int * intlist
val lst3 : intlist = Cons (3, Nil)
val lst123 : intlist = Cons (1, Cons (2, Cons (3, Nil)))
val sum : intlist -> int = <fun>
val length : intlist -> int = <fun>
val empty : intlist -> bool = <fun>

Notice that in the definition of intlist, we define the Cons constructor to carry a value that contains an intlist. This makes the type intlist be recursive: it is defined in terms of itself.

Types may be mutually recursive if you use the and keyword:

type node = {value : int; next : mylist}
and mylist = Nil | Node of node
type node = { value : int; next : mylist; }
and mylist = Nil | Node of node

Any such mutual recursion must involve at least one variant or record type that the recursion “goes through”. For example, the following is not allowed:

type t = u and u = t
File "[12]", line 1, characters 0-10:
1 | type t = u and u = t
    ^^^^^^^^^^
Error: The definition of t contains a cycle:
       u

But this is:

type t = U of u and u = T of t
type t = U of u
and u = T of t

Record types may also be recursive:

type node = {value : int; next : node}
type node = { value : int; next : node; }

But plain old type synonyms may not be:

type t = t * t
File "[15]", line 1, characters 0-14:
1 | type t = t * t
    ^^^^^^^^^^^^^^
Error: The type abbreviation t is cyclic

Although node is a legal type definition, there is no way to construct a value of that type because of the circularity involved: to construct the very first node value in existence, you would already need a value of type node to exist. Later, when we cover imperative features, we’ll see a similar idea used (but successfully) for mutable linked lists.

3.9.5. Parameterized Variants#

Variant types may be parameterized on other types. For example, the intlist type above could be generalized to provide lists (coded up ourselves) over any type:

type 'a mylist = Nil | Cons of 'a * 'a mylist

let lst3 = Cons (3, Nil)  (* similar to [3] *)
let lst_hi = Cons ("hi", Nil)  (* similar to ["hi"] *)
type 'a mylist = Nil | Cons of 'a * 'a mylist
val lst3 : int mylist = Cons (3, Nil)
val lst_hi : string mylist = Cons ("hi", Nil)

Here, mylist is a type constructor but not a type: there is no way to write a value of type mylist. But we can write value of type int mylist (e.g., lst3) and string mylist (e.g., lst_hi). Think of a type constructor as being like a function, but one that maps types to types, rather than values to value.

Here are some functions over 'a mylist:

let rec length : 'a mylist -> int = function
  | Nil -> 0
  | Cons (_, t) -> 1 + length t

let empty : 'a mylist -> bool = function
  | Nil -> true
  | Cons _ -> false
val length : 'a mylist -> int = <fun>
val empty : 'a mylist -> bool = <fun>

Notice that the body of each function is unchanged from its previous definition for intlist. All that we changed was the type annotation. And that could even be omitted safely:

let rec length = function
  | Nil -> 0
  | Cons (_, t) -> 1 + length t

let empty = function
  | Nil -> true
  | Cons _ -> false
val length : 'a mylist -> int = <fun>
val empty : 'a mylist -> bool = <fun>

The functions we just wrote are an example of a language feature called parametric polymorphism. The functions don’t care what the 'a is in 'a mylist, hence they are perfectly happy to work on int mylist or string mylist or any other (whatever) mylist. The word “polymorphism” is based on the Greek roots “poly” (many) and “morph” (form). A value of type 'a mylist could have many forms, depending on the actual type 'a.

As soon, though, as you place a constraint on what the type 'a might be, you give up some polymorphism. For example,

let rec sum = function
  | Nil -> 0
  | Cons (h, t) -> h + sum t
val sum : int mylist -> int = <fun>

The fact that we use the ( + ) operator with the head of the list constrains that head element to be an int, hence all elements must be int. That means sum must take in an int mylist, not any other kind of 'a mylist.

It is also possible to have multiple type parameters for a parameterized type, in which case parentheses are needed:

type ('a, 'b) pair = {first : 'a; second : 'b}
let x = {first = 2; second = "hello"}
type ('a, 'b) pair = { first : 'a; second : 'b; }
val x : (int, string) pair = {first = 2; second = "hello"}

3.9.6. Polymorphic Variants#

Thus far, whenever you’ve wanted to define a variant type, you have had to give it a name, such as day, shape, or 'a mylist:

type day = Sun | Mon | Tue | Wed | Thu | Fri | Sat

type shape =
  | Point of point
  | Circle of point * float
  | Rect of point * point

type 'a mylist = Nil | Cons of 'a * 'a mylist
type day = Sun | Mon | Tue | Wed | Thu | Fri | Sat
type shape = Point of point | Circle of point * float | Rect of point * point
type 'a mylist = Nil | Cons of 'a * 'a mylist

Occasionally, you might need a variant type only for the return value of a single function. For example, here’s a function f that can either return an int or \(\infty\); you are forced to define a variant type to represent that result:

type fin_or_inf = Finite of int | Infinity

let f = function
  | 0 -> Infinity
  | 1 -> Finite 1
  | n -> Finite (-n)
type fin_or_inf = Finite of int | Infinity
val f : int -> fin_or_inf = <fun>

The downside of this definition is that you were forced to define fin_or_inf even though it won’t be used throughout much of your program.

There’s another kind of variant in OCaml that supports this kind of programming: polymorphic variants. Polymorphic variants are just like variants, except:

  1. You don’t have to declare their type or constructors before using them.

  2. There is no name for a polymorphic variant type. (So another name for this feature could have been “anonymous variants”.)

  3. The constructors of a polymorphic variant start with a backquote character.

Using polymorphic variants, we can rewrite f:

let f = function
  | 0 -> `Infinity
  | 1 -> `Finite 1
  | n -> `Finite (-n)
val f : int -> [> `Finite of int | `Infinity ] = <fun>

This type says that f either returns `Finite n for some n : int or `Infinity. The square brackets do not denote a list, but rather a set of possible constructors. The > sign means that any code that pattern matches against a value of that type must at least handle the constructors `Finite and `Infinity, and possibly more. For example, we could write:

match f 3 with
  | `NegInfinity -> "negative infinity"
  | `Finite n -> "finite"
  | `Infinity -> "infinite"
- : string = "finite"

It’s perfectly fine for the pattern match to include constructors other than `Finite or `Infinity, because f is guaranteed never to return any constructors other than those.

There are other, more compelling uses for polymorphic variants that we’ll see later in the course. They are particularly useful in libraries. For now, we generally will steer you away from extensive use of polymorphic variants, because their types can become difficult to manage.

3.9.7. Built-in Variants#

OCaml’s built-in list data type is really a recursive, parameterized variant. It is defined as follows:

type 'a list = [] | ( :: ) of 'a * 'a list

So list is really just a type constructor, with (value) constructors [] (which we pronounce “nil”) and :: (which we pronounce “cons”).

OCaml’s built-in option data type is also really a parameterized variant. It’s defined as follows:

type 'a option = None | Some of 'a

So option is really just a type constructor, with (value) constructors None and Some.

You can see both list and option defined in the core OCaml library.