3.9. Algebraic Data Types#
Thus far, we have seen variants simply as enumerating a set of constant values, such as:
type day = Sun | Mon | Tue | Wed | Thu | Fri | Sat
type ptype = TNormal | TFire | TWater
type peff = ENormal | ENotVery | Esuper
But variants are far more powerful than this.
3.9.1. Variants that Carry Data#
As a running example, here is a variant type shape
that does more than just
enumerate values:
type point = float * float
type shape =
| Point of point
| Circle of point * float (* center and radius *)
| Rect of point * point (* lower-left and upper-right corners *)
type point = float * float
type shape = Point of point | Circle of point * float | Rect of point * point
This type, shape
, represents a shape that is either a point, a circle, or a
rectangle. A point is represented by a constructor Point
that carries some
additional data, which is a value of type point
. A circle is represented by a
constructor Circle
that carries two pieces of data: one of type point
and
the other of type float
. Those data represent the center of the circle and its
radius. A rectangle is represented by a constructor Rect
that carries two more
points.
Here are a couple functions that use the shape
type:
let area = function
| Point _ -> 0.0
| Circle (_, r) -> Float.pi *. (r ** 2.0)
| Rect ((x1, y1), (x2, y2)) ->
let w = x2 -. x1 in
let h = y2 -. y1 in
w *. h
let center = function
| Point p -> p
| Circle (p, _) -> p
| Rect ((x1, y1), (x2, y2)) -> ((x2 +. x1) /. 2.0, (y2 +. y1) /. 2.0)
val area : shape -> float = <fun>
val center : shape -> point = <fun>
The shape
variant type is the same as those we’ve seen before in that it is
defined in terms of a collection of constructors. What’s different than before
is that those constructors carry additional data along with them. Every value of
type shape
is formed from exactly one of those constructors. Sometimes we call
the constructor a tag, because it tags the data it carries as being from that
particular constructor.
Variant types are sometimes called tagged unions. Every value of the type is
from the set of values that is the union of all values from the underlying types
that the constructor carries. For example, with the shape
type, every value is
tagged with either Point
or Circle
or Rect
and carries a value from:
the set of all
point
values, unioned withthe set of all
point * float
values, unioned withthe set of all
point * point
values.
Another name for these variant types is an algebraic data type. “Algebra” here refers to the fact that variant types contain both sum and product types, as defined in the previous lecture. The sum types come from the fact that a value of a variant is formed by one of the constructors. The product types come from that fact that a constructor can carry tuples or records, whose values have a sub-value from each of their component types.
Using variants, we can express a type that represents the union of several other
types, but in a type-safe way. Here, for example, is a type that represents
either a string
or an int
:
type string_or_int =
| String of string
| Int of int
type string_or_int = String of string | Int of int
If we wanted to, we could use this type to code up lists (e.g.) that contain either strings or ints:
type string_or_int_list = string_or_int list
let rec sum : string_or_int list -> int = function
| [] -> 0
| String s :: t -> int_of_string s + sum t
| Int i :: t -> i + sum t
let lst_sum = sum [String "1"; Int 2]
type string_or_int_list = string_or_int list
val sum : string_or_int list -> int = <fun>
val lst_sum : int = 3
Variants thus provide a type-safe way of doing something that might before have seemed impossible.
Variants also make it possible to discriminate which tag a value was constructed with, even if multiple constructors carry the same type. For example:
type t = Left of int | Right of int
let x = Left 1
let double_right = function
| Left i -> i
| Right i -> 2 * i
type t = Left of int | Right of int
val x : t = Left 1
val double_right : t -> int = <fun>
3.9.2. Syntax and Semantics#
Syntax.
To define a variant type:
type t = C1 [of t1] | ... | Cn [of tn]
The square brackets above denote that of ti
is optional. Every constructor may
individually either carry no data or carry data. We call constructors that carry
no data constant; and those that carry data, non-constant.
To write an expression that is a variant:
C e
Or:
C
depending on whether the constructor name C
is non-constant or constant.
Dynamic semantics.
If
e ==> v
thenC e ==> C v
, assumingC
is non-constant.C
is already a value, assumingC
is constant.
Static semantics.
If
t = ... | C | ...
thenC : t
.If
t = ... | C of t' | ...
and ife : t'
thenC e : t
.
Pattern matching.
We add the following new pattern form to the list of legal patterns:
C p
And we extend the definition of when a pattern matches a value and produces a binding as follows:
If
p
matchesv
and produces bindings \(b\), thenC p
matchesC v
and produces bindings \(b\).
3.9.3. Catch-all Cases#
One thing to beware of when pattern matching against variants is what Real World OCaml calls “catch-all cases”. Here’s a simple example of what can go wrong. Let’s suppose you write this variant and function:
type color = Blue | Red
(* a thousand lines of code in between *)
let string_of_color = function
| Blue -> "blue"
| _ -> "red"
type color = Blue | Red
val string_of_color : color -> string = <fun>
Seems fine, right? But then one day you realize there are more colors in the world. You need to represent green. So you go back and add green to your variant:
type color = Blue | Red | Green
(* a thousand lines of code in between *)
let string_of_color = function
| Blue -> "blue"
| _ -> "red"
type color = Blue | Red | Green
val string_of_color : color -> string = <fun>
But because of the thousand lines of code in between, you forget that
string_of_color
needs updating. And now, all the sudden, you are
red-green color blind:
string_of_color Green
- : string = "red"
The problem is the catch-all case in the pattern match inside
string_of_color
: the final case that uses the wildcard pattern to match
anything. Such code is not robust against future changes to the variant type.
If, instead, you had originally coded the function as follows, life would be better:
let string_of_color = function
| Blue -> "blue"
| Red -> "red"
File "[9]", lines 1-3, characters 22-17:
1 | ......................function
2 | | Blue -> "blue"
3 | | Red -> "red"
Warning 8 [partial-match]: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
Green
val string_of_color : color -> string = <fun>
The OCaml type checker now alerts you that you haven’t yet updated
string_of_color
to account for the new constructor.
The moral of the story is: catch-all cases lead to buggy code. Avoid using them.
3.9.4. Recursive Variants#
Variant types may mention their own name inside their own body. For example,
here is a variant type that could be used to represent something similar to
int list
:
type intlist = Nil | Cons of int * intlist
let lst3 = Cons (3, Nil) (* similar to 3 :: [] or [3] *)
let lst123 = Cons(1, Cons(2, lst3)) (* similar to [1; 2; 3] *)
let rec sum (l : intlist) : int =
match l with
| Nil -> 0
| Cons (h, t) -> h + sum t
let rec length : intlist -> int = function
| Nil -> 0
| Cons (_, t) -> 1 + length t
let empty : intlist -> bool = function
| Nil -> true
| Cons _ -> false
type intlist = Nil | Cons of int * intlist
val lst3 : intlist = Cons (3, Nil)
val lst123 : intlist = Cons (1, Cons (2, Cons (3, Nil)))
val sum : intlist -> int = <fun>
val length : intlist -> int = <fun>
val empty : intlist -> bool = <fun>
Notice that in the definition of intlist
, we define the Cons
constructor to
carry a value that contains an intlist
. This makes the type intlist
be
recursive: it is defined in terms of itself.
Types may be mutually recursive if you use the and
keyword:
type node = {value : int; next : mylist}
and mylist = Nil | Node of node
type node = { value : int; next : mylist; }
and mylist = Nil | Node of node
Any such mutual recursion must involve at least one variant or record type that the recursion “goes through”. For example, the following is not allowed:
type t = u and u = t
File "[12]", line 1, characters 0-10:
1 | type t = u and u = t
^^^^^^^^^^
Error: The definition of t contains a cycle:
u
But this is:
type t = U of u and u = T of t
type t = U of u
and u = T of t
Record types may also be recursive:
type node = {value : int; next : node}
type node = { value : int; next : node; }
But plain old type synonyms may not be:
type t = t * t
File "[15]", line 1, characters 0-14:
1 | type t = t * t
^^^^^^^^^^^^^^
Error: The type abbreviation t is cyclic
Although node
is a legal type definition, there is no way to construct a value
of that type because of the circularity involved: to construct the very first
node
value in existence, you would already need a value of type node
to
exist. Later, when we cover imperative features, we’ll see a similar idea used
(but successfully) for mutable linked lists.
3.9.5. Parameterized Variants#
Variant types may be parameterized on other types. For example,
the intlist
type above could be generalized to provide lists (coded
up ourselves) over any type:
type 'a mylist = Nil | Cons of 'a * 'a mylist
let lst3 = Cons (3, Nil) (* similar to [3] *)
let lst_hi = Cons ("hi", Nil) (* similar to ["hi"] *)
type 'a mylist = Nil | Cons of 'a * 'a mylist
val lst3 : int mylist = Cons (3, Nil)
val lst_hi : string mylist = Cons ("hi", Nil)
Here, mylist
is a type constructor but not a type: there is no way to write
a value of type mylist
. But we can write value of type int mylist
(e.g.,
lst3
) and string mylist
(e.g., lst_hi
). Think of a type constructor as
being like a function, but one that maps types to types, rather than values to
value.
Here are some functions over 'a mylist
:
let rec length : 'a mylist -> int = function
| Nil -> 0
| Cons (_, t) -> 1 + length t
let empty : 'a mylist -> bool = function
| Nil -> true
| Cons _ -> false
val length : 'a mylist -> int = <fun>
val empty : 'a mylist -> bool = <fun>
Notice that the body of each function is unchanged from its previous definition
for intlist
. All that we changed was the type annotation. And that could even
be omitted safely:
let rec length = function
| Nil -> 0
| Cons (_, t) -> 1 + length t
let empty = function
| Nil -> true
| Cons _ -> false
val length : 'a mylist -> int = <fun>
val empty : 'a mylist -> bool = <fun>
The functions we just wrote are an example of a language feature called
parametric polymorphism. The functions don’t care what the 'a
is in
'a mylist
, hence they are perfectly happy to work on int mylist
or
string mylist
or any other (whatever) mylist
. The word “polymorphism” is
based on the Greek roots “poly” (many) and “morph” (form). A value of type
'a mylist
could have many forms, depending on the actual type 'a
.
As soon, though, as you place a constraint on what the type 'a
might be, you
give up some polymorphism. For example,
let rec sum = function
| Nil -> 0
| Cons (h, t) -> h + sum t
val sum : int mylist -> int = <fun>
The fact that we use the ( + )
operator with the head of the list constrains
that head element to be an int
, hence all elements must be int
. That means
sum
must take in an int mylist
, not any other kind of 'a mylist
.
It is also possible to have multiple type parameters for a parameterized type, in which case parentheses are needed:
type ('a, 'b) pair = {first : 'a; second : 'b}
let x = {first = 2; second = "hello"}
type ('a, 'b) pair = { first : 'a; second : 'b; }
val x : (int, string) pair = {first = 2; second = "hello"}
3.9.6. Polymorphic Variants#
Thus far, whenever you’ve wanted to define a variant type, you have had to give
it a name, such as day
, shape
, or 'a mylist
:
type day = Sun | Mon | Tue | Wed | Thu | Fri | Sat
type shape =
| Point of point
| Circle of point * float
| Rect of point * point
type 'a mylist = Nil | Cons of 'a * 'a mylist
type day = Sun | Mon | Tue | Wed | Thu | Fri | Sat
type shape = Point of point | Circle of point * float | Rect of point * point
type 'a mylist = Nil | Cons of 'a * 'a mylist
Occasionally, you might need a variant type only for the return value of a
single function. For example, here’s a function f
that can either return an
int
or \(\infty\); you are forced to define a variant type to represent that
result:
type fin_or_inf = Finite of int | Infinity
let f = function
| 0 -> Infinity
| 1 -> Finite 1
| n -> Finite (-n)
type fin_or_inf = Finite of int | Infinity
val f : int -> fin_or_inf = <fun>
The downside of this definition is that you were forced to define
fin_or_inf
even though it won’t be used throughout much of your program.
There’s another kind of variant in OCaml that supports this kind of programming: polymorphic variants. Polymorphic variants are just like variants, except:
You don’t have to declare their type or constructors before using them.
There is no name for a polymorphic variant type. (So another name for this feature could have been “anonymous variants”.)
The constructors of a polymorphic variant start with a backquote character.
Using polymorphic variants, we can rewrite f
:
let f = function
| 0 -> `Infinity
| 1 -> `Finite 1
| n -> `Finite (-n)
val f : int -> [> `Finite of int | `Infinity ] = <fun>
This type says that f
either returns `Finite n
for some n : int
or
`Infinity
. The square brackets do not denote a list, but rather a set of
possible constructors. The >
sign means that any code that pattern matches
against a value of that type must at least handle the constructors
`Finite
and `Infinity
, and possibly more. For example, we could write:
match f 3 with
| `NegInfinity -> "negative infinity"
| `Finite n -> "finite"
| `Infinity -> "infinite"
- : string = "finite"
It’s perfectly fine for the pattern match to include constructors other than
`Finite
or `Infinity
, because f
is guaranteed never to return any
constructors other than those.
There are other, more compelling uses for polymorphic variants that we’ll see later in the course. They are particularly useful in libraries. For now, we generally will steer you away from extensive use of polymorphic variants, because their types can become difficult to manage.
3.9.7. Built-in Variants#
OCaml’s built-in list data type is really a recursive, parameterized variant. It is defined as follows:
type 'a list = [] | ( :: ) of 'a * 'a list
So list
is really just a type constructor, with (value) constructors
[]
(which we pronounce “nil”) and ::
(which we pronounce “cons”).
OCaml’s built-in option data type is also really a parameterized variant. It’s defined as follows:
type 'a option = None | Some of 'a
So option
is really just a type constructor, with (value) constructors
None
and Some
.
You can see both list
and option
defined in the core OCaml library.