5.4. Encapsulation#

One of the main concerns of a module system is to provide encapsulation: the hiding of information about implementation behind an interface. OCaml’s module system makes this possible with a feature we’ve already seen: the opacity that module type annotations create. One special use of opacity is the declaration of abstract types. We’ll study both of those ideas in this section.

5.4.1. Opacity#

When implementing a module, you might sometimes have helper functions that you don’t want to expose to clients of the module. For example, maybe you’re implementing a math module that provides a tail-recursive factorial function:

module Math = struct
  (** [fact_aux n acc] is [n! * acc]. *)
  let rec fact_aux n acc =
    if n = 0 then acc else fact_aux (n - 1) (n * acc)

  (** [fact n] is [n!]. *)
  let fact n = fact_aux n 1
end
module Math : sig val fact_aux : int -> int -> int val fact : int -> int end

You’d like to make fact usable by clients of Math, but you’d also like to keep fact_aux hidden. But in the code above, you can see that fact_aux is visible in the signature inferred for Math. One way to hide it is simply to nest fact_aux:

module Math = struct
  (** [fact n] is [n!]. *)
  let fact n =
    (** [fact_aux n acc] is [n! * acc]. *)
    let rec fact_aux n acc =
      if n = 0 then acc else fact_aux (n - 1) (n * acc)
    in
    fact_aux n 1
end
module Math : sig val fact : int -> int end

Look at the signature, and notice how fact_aux is gone. But, that nesting makes fact just a little harder to read. It also means fact_aux is not available for any other functions inside Math to use. In this case that’s probably fine—there probably aren’t any other functions in Math that need fact_aux. But if there were, we couldn’t nest fact_aux.

So another way to hide fact_aux from clients of Math, while still leaving it available for implementers of Math, is to use a module type that exposes only those names that clients should see:

module type MATH = sig
  (** [fact n] is [n!]. *)
  val fact : int -> int
end

module Math : MATH = struct
  (** [fact_aux n acc] is [n! * acc]. *)
  let rec fact_aux n acc =
    if n = 0 then acc else fact_aux (n - 1) (n * acc)

  let fact n = fact_aux n 1
end
module type MATH = sig val fact : int -> int end
module Math : MATH

Now since MATH does not mention fact_aux, the module type annotation Math : MATH causes fact_aux to be hidden:

Math.fact_aux
File "[4]", line 1, characters 0-13:
1 | Math.fact_aux
    ^^^^^^^^^^^^^
Error: Unbound value Math.fact_aux

In that sense, module type annotations are opaque: they can prevent visibility of module items. We say that the module type seals the module, making any components not named in the module type be inaccessible.

Important

Remember that module type annotations are therefore not only about checking to see whether a module defines certain items. The annotations also hide items.

What if you did want to just check the definitions, but not hide anything? Then don’t supply the annotation at the time of module definition:

module type MATH = sig
  (** [fact n] is [n!]. *)
  val fact : int -> int
end

module Math = struct
  (** [fact_aux n acc] is [n! * acc]. *)
  let rec fact_aux n acc =
    if n = 0 then acc else fact_aux (n - 1) (n * acc)

  let fact n = fact_aux n 1
end

module MathCheck : MATH = Math
module type MATH = sig val fact : int -> int end
module Math : sig val fact_aux : int -> int -> int val fact : int -> int end
module MathCheck : MATH

Now Math.fact_aux is visible, but MathCheck.fact_aux is not:

Math.fact_aux
- : int -> int -> int = <fun>
MathCheck.fact_aux
File "[7]", line 1, characters 0-18:
1 | MathCheck.fact_aux
    ^^^^^^^^^^^^^^^^^^
Error: Unbound value MathCheck.fact_aux

You wouldn’t even have to give the “check” module a name since you probably never intend to access it; you could instead leave it anonymous:

module _ : MATH = Math

A Comparison to Visibility Modifiers. The use of sealing in OCaml is thus similar to the use of visibility modifiers such as private and public in Java. In fact one way to think about Java class definitions is that they simultaneously define multiple signatures.

For example, consider this Java class:

class C {
  private int x;
  public int y;
}

An analogy to it in OCaml would be the following modules and types:

module type C_PUBLIC = sig
  val y : int
end

module CPrivate = struct
  let x = 0
  let y = 0
end

module C : C_PUBLIC = CPrivate
module type C_PUBLIC = sig val y : int end
module CPrivate : sig val x : int val y : int end
module C : C_PUBLIC

With those definitions, any code that uses C will have access only to the names exposed in the C_PUBLIC module type.

That analogy can be extended to the other visibility modifiers, protected and default, as well. Which means that Java classes are effectively defining four related types, and the compiler is making sure the right type is used at each place in the code base C is named. No wonder it can be challenging to master visibility in OO languages at first.

5.4.2. Abstract Types#

In an earlier section we implemented stacks as lists with the following module and type:

module type LIST_STACK = sig
  (** [Empty] is raised when an operation cannot be applied
      to an empty stack. *)
  exception Empty

  (** [empty] is the empty stack. *)
  val empty : 'a list

  (** [is_empty s] is whether [s] is empty. *)
  val is_empty : 'a list -> bool

  (** [push x s] pushes [x] onto the top of [s]. *)
  val push : 'a -> 'a list -> 'a list

  (** [peek s] is the top element of [s].
      Raises [Empty] if [s] is empty. *)
  val peek : 'a list -> 'a

  (** [pop s] is all but the top element of [s].
      Raises [Empty] if [s] is empty. *)
  val pop : 'a list -> 'a list
end

module ListStack : LIST_STACK = struct
  exception Empty
  let empty = []
  let is_empty = function [] -> true | _ -> false
  let push x s = x :: s
  let peek = function [] -> raise Empty | x :: _ -> x
  let pop = function [] -> raise Empty | _ :: s -> s
end
Hide code cell output
module type LIST_STACK =
  sig
    exception Empty
    val empty : 'a list
    val is_empty : 'a list -> bool
    val push : 'a -> 'a list -> 'a list
    val peek : 'a list -> 'a
    val pop : 'a list -> 'a list
  end
module ListStack : LIST_STACK

What if we wanted to modify that data structure to add an operation for the size of the stack? The easy way would be to implement it using List.length:

module type LIST_STACK = sig
  ...
  (** [size s] is the number of elements on the stack. *)
  val size : 'a list -> int
end

module ListStack : LIST_STACK = struct
  ...
  let size = List.length
end

That results in a linear-time implementation of size. What if we wanted a faster, constant-time implementation? At the cost of a little space, we could cache the size of the stack. Let’s now represent the stack as a pair, where the first component of the pair is the same list as before, and the second component of the pair is the size of the stack:

module ListStackCachedSize = struct
  exception Empty
  let empty = ([], 0)
  let is_empty = function ([], _) -> true | _ -> false
  let push x (stack, size) = (x :: stack, size + 1)
  let peek = function ([], _) -> raise Empty | (x :: _, _) -> x
  let pop = function
    | ([], _) -> raise Empty
    | (_ :: stack, size) -> (stack, size - 1)
end
Hide code cell output
module ListStackCachedSize :
  sig
    exception Empty
    val empty : 'a list * int
    val is_empty : 'a list * 'b -> bool
    val push : 'a -> 'a list * int -> 'a list * int
    val peek : 'a list * 'b -> 'a
    val pop : 'a list * int -> 'a list * int
  end

We have a big problem. ListStackCachedSize does not implement the LIST_STACK module type, because that module type specifies 'a list throughout it to represent the stack—not 'a list * int.

module CheckListStackCachedSize : LIST_STACK = ListStackCachedSize
File "[12]", line 1, characters 47-66:
1 | module CheckListStackCachedSize : LIST_STACK = ListStackCachedSize
                                                   ^^^^^^^^^^^^^^^^^^^
Error: Signature mismatch:
       ...
       Values do not match:
         val empty : 'a list * int
       is not included in
         val empty : 'a list
       The type 'a list * int is not compatible with the type 'b list
       File "[10]", line 7, characters 2-21: Expected declaration
       File "[11]", line 3, characters 6-11: Actual declaration

Moreover, any code we previously wrote using ListStack now has to be modified to deal with the pair, which could mean revising pattern matches, function types, and so forth.

As you no doubt learned in earlier programming courses, the problem we are encountering here is a lack of encapsulation. We should have kept the type that implements ListStack hidden from clients. In Java, for example, we might have written:

class ListStack<T> {
  private List<T> stack;
  private int size;
  ...
}

That way clients of ListStack would be unaware of stack or size. In fact, they wouldn’t be able to name those fields at all. Instead, they would just use ListStack as the type of the stack:

ListStack<Integer> s = new ListStack<>();
s.push(1);

So in OCaml, how can we keep the representation type of the stack hidden? What we learned about opacity and sealing thus far does not suffice. The problem is that the type 'a list * int literally appears in the signature of ListStackCachedSize, e.g., in push:

ListStackCachedSize.push
- : 'a -> 'a list * int -> 'a list * int = <fun>

A module type annotation could hide one of the values defined in ListStackCachedSize, e.g., push itself, but that doesn’t solve the problem: we need to hide the type 'a list * int while exposing the operation push. So OCaml has a feature for doing exactly that: abstract types. Let’s see an example of this feature.

We begin by modifying LIST_STACK, replacing 'a list with a new type 'a stack everywhere. We won’t repeat the specification comments here, so as to keep the example shorter. And while we’re at it, let’s add the size operation.

module type LIST_STACK = sig
  type 'a stack
  exception Empty
  val empty : 'a stack
  val is_empty : 'a stack -> bool
  val push : 'a -> 'a stack -> 'a stack
  val peek : 'a stack -> 'a
  val pop : 'a stack -> 'a stack
  val size : 'a stack -> int
end
Hide code cell output
module type LIST_STACK =
  sig
    type 'a stack
    exception Empty
    val empty : 'a stack
    val is_empty : 'a stack -> bool
    val push : 'a -> 'a stack -> 'a stack
    val peek : 'a stack -> 'a
    val pop : 'a stack -> 'a stack
    val size : 'a stack -> int
  end

Note how 'a stack is not actually defined in that signature. We haven’t said anything about what it is. It might be 'a list, or 'a list * int, or {stack : 'a list; size : int}, or anything else. That is what makes it an abstract type: we’ve declared its name but not specified its definition.

Now ListStackCachedSize can implement that module type with the addition of just one line of code: the first line of the structure, which defines 'a stack:

module ListStackCachedSize : LIST_STACK = struct
  type 'a stack = 'a list * int
  exception Empty
  let empty = ([], 0)
  let is_empty = function ([], _) -> true | _ -> false
  let push x (stack, size) = (x :: stack, size + 1)
  let peek = function ([], _) -> raise Empty | (x :: _, _) -> x
  let pop = function
    | ([], _) -> raise Empty
    | (_ :: stack, size) -> (stack, size - 1)
  let size = snd
end
module ListStackCachedSize : LIST_STACK

Take a careful look at the output: nowhere does 'a list show up in it. In fact, only LIST_STACK does. And LIST_STACK mentions only 'a stack. So no one’s going to know that internally a list is used. (Ok, they’re going to know: the name suggests it. But the point is they can’t take advantage of that, because the type is abstract.)

Likewise, our original implementation with linear-time size satisfies the module type. We just have to add a line to define 'a stack:

module ListStack : LIST_STACK = struct
  type 'a stack = 'a list
  exception Empty
  let empty = []
  let is_empty = function [] -> true | _ -> false
  let push x s = x :: s
  let peek = function [] -> raise Empty | x :: _ -> x
  let pop = function [] -> raise Empty | _ :: s -> s
  let size = List.length
end
module ListStack : LIST_STACK

Note that omitting that added line would result in an error, just as if we had failed to define push or any of the other operations from the module type:

module ListStack : LIST_STACK = struct
  (* type 'a stack = 'a list *)
  exception Empty
  let empty = []
  let is_empty = function [] -> true | _ -> false
  let push x s = x :: s
  let peek = function [] -> raise Empty | x :: _ -> x
  let pop = function [] -> raise Empty | _ :: s -> s
  let size = List.length
end
File "[17]", lines 1-10, characters 32-3:
 1 | ................................struct
 2 |   (* type 'a stack = 'a list *)
 3 |   exception Empty
 4 |   let empty = []
 5 |   let is_empty = function [] -> true | _ -> false
 6 |   let push x s = x :: s
 7 |   let peek = function [] -> raise Empty | x :: _ -> x
 8 |   let pop = function [] -> raise Empty | _ :: s -> s
 9 |   let size = List.length
10 | end
Error: Signature mismatch:
       ...
       The type `stack' is required but not provided
       File "[14]", line 2, characters 2-15: Expected declaration

Here is a third, custom implementation of LIST_STACK. This one is deliberately overly-complicated, in part to illustrate how the abstract type can hide implementation details that are better not revealed to clients:

module CustomStack : LIST_STACK = struct
  type 'a entry = {top : 'a; rest : 'a stack; size : int}
  and 'a stack = S of 'a entry option
  exception Empty
  let empty = S None
  let is_empty = function S None -> true | _ -> false
  let size = function S None -> 0 | S (Some {size}) -> size
  let push x s = S (Some {top = x; rest = s; size = size s + 1})
  let peek = function S None -> raise Empty | S (Some {top}) -> top
  let pop = function S None -> raise Empty | S (Some {rest}) -> rest
end
module CustomStack : LIST_STACK

Is that really a “list” stack? It satisfies the module type LIST_STACK. But upon reflection, that module type never really had anything to do with lists once we made the type 'a stack abstract. There’s really no need to call it LIST_STACK. We’d be better off using just STACK, since it can be implemented with list or without. At that point, we could just go with Stack as its name, since there is no module named Stack we’ve written that would be confused with it. That avoids the all-caps look of our code shouting at us.

module type Stack = sig
  type 'a stack
  exception Empty
  val empty : 'a stack
  val is_empty : 'a stack -> bool
  val push : 'a -> 'a stack -> 'a stack
  val peek : 'a stack -> 'a
  val pop : 'a stack -> 'a stack
  val size : 'a stack -> int
end

module ListStack : Stack = struct
  type 'a stack = 'a list
  exception Empty
  let empty = []
  let is_empty = function [] -> true | _ -> false
  let push x s = x :: s
  let peek = function [] -> raise Empty | x :: _ -> x
  let pop = function [] -> raise Empty | _ :: s -> s
  let size = List.length
end
Hide code cell output
module type Stack =
  sig
    type 'a stack
    exception Empty
    val empty : 'a stack
    val is_empty : 'a stack -> bool
    val push : 'a -> 'a stack -> 'a stack
    val peek : 'a stack -> 'a
    val pop : 'a stack -> 'a stack
    val size : 'a stack -> int
  end
module ListStack : Stack

There’s one further naming improvement we could make. Notice the type of ListStack.empty (and don’t worry about the abstr part for now; we’ll come back to it):

ListStack.empty
- : 'a ListStack.stack = <abstr>

That type, 'a ListStack.stack, is rather unwieldy, because it conveys the word “stack” twice: once in the name of the module, and again in the name of the representation type inside that module. In places like this, OCaml programmers idiomatically use a standard name, t, in place of a longer representation type name:

module type Stack = sig
  type 'a t
  exception Empty
  val empty : 'a t
  val is_empty : 'a t -> bool
  val push : 'a -> 'a t -> 'a t
  val peek : 'a t -> 'a
  val pop : 'a t -> 'a t
  val size : 'a t -> int
end

module ListStack : Stack = struct
  type 'a t = 'a list
  exception Empty
  let empty = []
  let is_empty = function [] -> true | _ -> false
  let push x s = x :: s
  let peek = function [] -> raise Empty | x :: _ -> x
  let pop = function [] -> raise Empty | _ :: s -> s
  let size = List.length
end

module CustomStack : Stack = struct
  type 'a entry = {top : 'a; rest : 'a t; size : int}
  and 'a t = S of 'a entry option
  exception Empty
  let empty = S None
  let is_empty = function S None -> true | _ -> false
  let size = function S None -> 0 | S (Some {size}) -> size
  let push x s = S (Some {top = x; rest = s; size = size s + 1})
  let peek = function S None -> raise Empty | S (Some {top}) -> top
  let pop = function S None -> raise Empty | S (Some {rest}) -> rest
end
Hide code cell output
module type Stack =
  sig
    type 'a t
    exception Empty
    val empty : 'a t
    val is_empty : 'a t -> bool
    val push : 'a -> 'a t -> 'a t
    val peek : 'a t -> 'a
    val pop : 'a t -> 'a t
    val size : 'a t -> int
  end
module ListStack : Stack
module CustomStack : Stack

Now the type of stacks is simpler:

ListStack.empty;;
CustomStack.empty;;
- : 'a ListStack.t = <abstr>
- : 'a CustomStack.t = <abstr>

That idiom is fairly common when there’s a single representation type exposed by an interface to a data structure. You’ll see it used throughout the standard library.

In informal conversation we would usually pronounce those types without the “dot t” part. For example, we might say “alpha ListStack”, simply ignoring the t—though it does technically have to be there to be legal OCaml code.

Finally, abstract types are really just a special case of opacity. You actually can expose the definition of a type in a signature if you want to:

module type T = sig
  type t = int
  val x : t
end

module M : T = struct
  type t = int
  let x = 42
end

let a : int = M.x
module type T = sig type t = int val x : t end
module M : T
val a : int = 42

Note how we’re able to use M.x at its type of int. That works because the equality of types t and int has been exposed in the module type. But if we kept t abstract, the same usage would fail:

module type T = sig
  type t (* = int *)
  val x : t
end

module M : T = struct
  type t = int
  let x = 42
end

let a : int = M.x
module type T = sig type t val x : t end
module M : T
File "[24]", line 11, characters 14-17:
11 | let a : int = M.x
                   ^^^
Error: This expression has type M.t but an expression was expected of type
         int

We’re not allowed to use M.x at type int outside of M, because its type M.t is abstract. This is encapsulation at work, keeping that implementation detail hidden.

5.4.3. Pretty Printing#

In some output above, we observed something curious: the toplevel prints <abstr> in place of the actual contents of a value whose type is abstract:

ListStack.empty;;
ListStack.(empty |> push 1 |> push 2);;
- : 'a ListStack.t = <abstr>
- : int ListStack.t = <abstr>

Recall that the toplevel uses this angle-bracket convention to indicate an unprintable value. We’ve encountered that before with functions and <fun>:

fun x -> x
- : 'a -> 'a = <fun>

On the one hand, it’s reasonable for the toplevel to behave this way. Once a type is abstract, its implementation isn’t meant to be revealed to clients. So actually printing out the list [] or [2; 1] as responses to the above inputs would be revealing more than is intended.

On the other hand, it’s also reasonable for implementers to provide clients with a friendly way to view a value of an abstract type. Java programmers, for example, will often write toString() methods so that objects can be printed as output in the terminal or in JShell. To support that, the OCaml toplevel has a directive #install_printer, which registers a function to print values. Here’s how it works.

  • You write a pretty printing function of type Format.formatter -> t -> unit, for whatever type t you like. Let’s suppose for sake of example that you name that function pp.

  • You invoke #install_printer pp in the toplevel.

  • From now on, anytime the toplevel wants to print a value of type t it uses your function pp to do so.

It probably makes sense the pretty printing function needs to take in a value of type t (because that’s what it needs to print) and returns unit (as other printing functions do). But why does it take the Format.formatter argument? It’s because of a fairly high-powered feature that OCaml is attempting to provide here: automatic line breaking and indentation in the middle of very large outputs.

Consider the output from this expression, which creates nested lists:

List.init 15 (fun n -> List.init n (Fun.const n))
- : int list list =
[[]; [1]; [2; 2]; [3; 3; 3]; [4; 4; 4; 4]; [5; 5; 5; 5; 5];
 [6; 6; 6; 6; 6; 6]; [7; 7; 7; 7; 7; 7; 7]; [8; 8; 8; 8; 8; 8; 8; 8];
 [9; 9; 9; 9; 9; 9; 9; 9; 9]; [10; 10; 10; 10; 10; 10; 10; 10; 10; 10];
 [11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11];
 [12; 12; 12; 12; 12; 12; 12; 12; 12; 12; 12; 12];
 [13; 13; 13; 13; 13; 13; 13; 13; 13; 13; 13; 13; 13];
 [14; 14; 14; 14; 14; 14; 14; 14; 14; 14; 14; 14; 14; 14]]

Each inner list contains n copies of the number n. Note how the indentation and line breaks are somewhat sophisticated. All the inner lists are indented one space from the left-hand margin. Line breaks have been inserted to avoid splitting inner lists over multiple lines.

The Format module is what provides this functionality, and Format.formatter is an abstract type in it. You could think of a formatter as being a place to send output, like a file, and have it be automatically formatted along the way. The typical use of a formatter is as argument to a function such as Format.fprintf, which like Printf uses format specifiers.

For example, suppose you wanted to change how strings are printed by the toplevel and add ” kupo” to the end of each string. Here’s code that would do it:

let kupo_pp fmt s = Format.fprintf fmt "%s kupo" s;;
#install_printer kupo_pp;;
val kupo_pp : Format.formatter -> string -> unit = <fun>

Now you can see that the toplevel adds ” kupo” to each string while printing it, even though it’s not actual a part of the original string:

let h = "Hello"
let s = String.length h
val h : string = Hello kupo
val s : int = 5

To keep ourselves from getting confused about strings in the rest of this section, let’s uninstall that pretty printer before going on:

#remove_printer kupo_pp;;

As a bigger example, let’s add pretty printing to ListStack:

module type Stack = sig
  type 'a t
  exception Empty
  val empty : 'a t
  val is_empty : 'a t -> bool
  val push : 'a -> 'a t -> 'a t
  val peek : 'a t -> 'a
  val pop : 'a t -> 'a t
  val size : 'a t -> int
  val pp :
    (Format.formatter -> 'a -> unit) -> Format.formatter -> 'a t -> unit
end
Hide code cell output
module type Stack =
  sig
    type 'a t
    exception Empty
    val empty : 'a t
    val is_empty : 'a t -> bool
    val push : 'a -> 'a t -> 'a t
    val peek : 'a t -> 'a
    val pop : 'a t -> 'a t
    val size : 'a t -> int
    val pp :
      (Format.formatter -> 'a -> unit) -> Format.formatter -> 'a t -> unit
  end

First, notice that we have to expose pp as part of the module type. Otherwise it would be encapsulated, hence we wouldn’t be able to install it. Second, notice that the type of pp now takes an extra first argument of type Format.formatter -> 'a -> unit. That is itself a pretty printer for type 'a, on which t is parameterized. We need that argument in order to be able to pretty print the values of type 'a.

module ListStack : Stack = struct
  type 'a t = 'a list
  exception Empty
  let empty = []
  let is_empty = function [] -> true | _ -> false
  let push x s = x :: s
  let peek = function [] -> raise Empty | x :: _ -> x
  let pop = function [] -> raise Empty | _ :: s -> s
  let size = List.length
  let pp pp_val fmt s =
    let open Format in
    let pp_break fmt () = fprintf fmt "@," in
    fprintf fmt "@[<v 0>top of stack";
    if s <> [] then fprintf fmt "@,";
    pp_print_list ~pp_sep:pp_break pp_val fmt s;
    fprintf fmt "@,bottom of stack@]"
end
Hide code cell output
module ListStack : Stack

In ListStack.pp, we use some of the advanced features of the Format module. Function Format.pp_print_list does the heavy lifting to print all the elements of the stack. The rest of the code handles the indentation and line breaks. Here’s the result:

#install_printer ListStack.pp
ListStack.empty
- : 'a ListStack.t = top of stack
                     bottom of stack
ListStack.(empty |> push 1 |> push 2)
- : int ListStack.t = top of stack
                      2
                      1
                      bottom of stack

For more information, see the toplevel manual (search for #install_printer), the Format module, and this OCaml GitHub issue. The latter seems to be the only place that documents the use of extra arguments, as in pp_val above, to print values of polymorphic types.