5.4. Encapsulation#
One of the main concerns of a module system is to provide encapsulation: the hiding of information about implementation behind an interface. OCaml’s module system makes this possible with a feature we’ve already seen: the opacity that module type annotations create. One special use of opacity is the declaration of abstract types. We’ll study both of those ideas in this section.
5.4.1. Opacity#
When implementing a module, you might sometimes have helper functions that you don’t want to expose to clients of the module. For example, maybe you’re implementing a math module that provides a tail-recursive factorial function:
module Math = struct
(** [fact_aux n acc] is [n! * acc]. *)
let rec fact_aux n acc =
if n = 0 then acc else fact_aux (n - 1) (n * acc)
(** [fact n] is [n!]. *)
let fact n = fact_aux n 1
end
module Math : sig val fact_aux : int -> int -> int val fact : int -> int end
You’d like to make fact
usable by clients of Math
, but you’d also like to
keep fact_aux
hidden. But in the code above, you can see that fact_aux
is
visible in the signature inferred for Math
. One way to hide it is simply to
nest fact_aux
:
module Math = struct
(** [fact n] is [n!]. *)
let fact n =
(** [fact_aux n acc] is [n! * acc]. *)
let rec fact_aux n acc =
if n = 0 then acc else fact_aux (n - 1) (n * acc)
in
fact_aux n 1
end
module Math : sig val fact : int -> int end
Look at the signature, and notice how fact_aux
is gone. But, that nesting
makes fact
just a little harder to read. It also means fact_aux
is not
available for any other functions inside Math
to use. In this case that’s
probably fine—there probably aren’t any other functions in Math
that
need fact_aux
. But if there were, we couldn’t nest fact_aux
.
So another way to hide fact_aux
from clients of Math
, while still leaving it
available for implementers of Math
, is to use a module type that exposes only
those names that clients should see:
module type MATH = sig
(** [fact n] is [n!]. *)
val fact : int -> int
end
module Math : MATH = struct
(** [fact_aux n acc] is [n! * acc]. *)
let rec fact_aux n acc =
if n = 0 then acc else fact_aux (n - 1) (n * acc)
let fact n = fact_aux n 1
end
module type MATH = sig val fact : int -> int end
module Math : MATH
Now since MATH
does not mention fact_aux
, the module type annotation
Math : MATH
causes fact_aux
to be hidden:
Math.fact_aux
File "[4]", line 1, characters 0-13:
1 | Math.fact_aux
^^^^^^^^^^^^^
Error: Unbound value Math.fact_aux
In that sense, module type annotations are opaque: they can prevent visibility of module items. We say that the module type seals the module, making any components not named in the module type be inaccessible.
Important
Remember that module type annotations are therefore not only about checking to see whether a module defines certain items. The annotations also hide items.
What if you did want to just check the definitions, but not hide anything? Then don’t supply the annotation at the time of module definition:
module type MATH = sig
(** [fact n] is [n!]. *)
val fact : int -> int
end
module Math = struct
(** [fact_aux n acc] is [n! * acc]. *)
let rec fact_aux n acc =
if n = 0 then acc else fact_aux (n - 1) (n * acc)
let fact n = fact_aux n 1
end
module MathCheck : MATH = Math
module type MATH = sig val fact : int -> int end
module Math : sig val fact_aux : int -> int -> int val fact : int -> int end
module MathCheck : MATH
Now Math.fact_aux
is visible, but MathCheck.fact_aux
is not:
Math.fact_aux
- : int -> int -> int = <fun>
MathCheck.fact_aux
File "[7]", line 1, characters 0-18:
1 | MathCheck.fact_aux
^^^^^^^^^^^^^^^^^^
Error: Unbound value MathCheck.fact_aux
You wouldn’t even have to give the “check” module a name since you probably never intend to access it; you could instead leave it anonymous:
module _ : MATH = Math
A Comparison to Visibility Modifiers. The use of sealing in OCaml is thus
similar to the use of visibility modifiers such as private
and public
in
Java. In fact one way to think about Java class definitions is that they
simultaneously define multiple signatures.
For example, consider this Java class:
class C {
private int x;
public int y;
}
An analogy to it in OCaml would be the following modules and types:
module type C_PUBLIC = sig
val y : int
end
module CPrivate = struct
let x = 0
let y = 0
end
module C : C_PUBLIC = CPrivate
module type C_PUBLIC = sig val y : int end
module CPrivate : sig val x : int val y : int end
module C : C_PUBLIC
With those definitions, any code that uses C
will have access only to the
names exposed in the C_PUBLIC
module type.
That analogy can be extended to the other visibility modifiers, protected
and
default, as well. Which means that Java classes are effectively defining four
related types, and the compiler is making sure the right type is used at each
place in the code base C
is named. No wonder it can be challenging to master
visibility in OO languages at first.
5.4.2. Abstract Types#
In an earlier section we implemented stacks as lists with the following module and type:
module type LIST_STACK = sig
(** [Empty] is raised when an operation cannot be applied
to an empty stack. *)
exception Empty
(** [empty] is the empty stack. *)
val empty : 'a list
(** [is_empty s] is whether [s] is empty. *)
val is_empty : 'a list -> bool
(** [push x s] pushes [x] onto the top of [s]. *)
val push : 'a -> 'a list -> 'a list
(** [peek s] is the top element of [s].
Raises [Empty] if [s] is empty. *)
val peek : 'a list -> 'a
(** [pop s] is all but the top element of [s].
Raises [Empty] if [s] is empty. *)
val pop : 'a list -> 'a list
end
module ListStack : LIST_STACK = struct
exception Empty
let empty = []
let is_empty = function [] -> true | _ -> false
let push x s = x :: s
let peek = function [] -> raise Empty | x :: _ -> x
let pop = function [] -> raise Empty | _ :: s -> s
end
Show code cell output
module type LIST_STACK =
sig
exception Empty
val empty : 'a list
val is_empty : 'a list -> bool
val push : 'a -> 'a list -> 'a list
val peek : 'a list -> 'a
val pop : 'a list -> 'a list
end
module ListStack : LIST_STACK
What if we wanted to modify that data structure to add an operation for
the size of the stack? The easy way would be to implement it using
List.length
:
module type LIST_STACK = sig
...
(** [size s] is the number of elements on the stack. *)
val size : 'a list -> int
end
module ListStack : LIST_STACK = struct
...
let size = List.length
end
That results in a linear-time implementation of size
. What if we wanted a
faster, constant-time implementation? At the cost of a little space, we could
cache the size of the stack. Let’s now represent the stack as a pair, where the
first component of the pair is the same list as before, and the second component
of the pair is the size of the stack:
module ListStackCachedSize = struct
exception Empty
let empty = ([], 0)
let is_empty = function ([], _) -> true | _ -> false
let push x (stack, size) = (x :: stack, size + 1)
let peek = function ([], _) -> raise Empty | (x :: _, _) -> x
let pop = function
| ([], _) -> raise Empty
| (_ :: stack, size) -> (stack, size - 1)
end
Show code cell output
module ListStackCachedSize :
sig
exception Empty
val empty : 'a list * int
val is_empty : 'a list * 'b -> bool
val push : 'a -> 'a list * int -> 'a list * int
val peek : 'a list * 'b -> 'a
val pop : 'a list * int -> 'a list * int
end
We have a big problem. ListStackCachedSize
does not implement the
LIST_STACK
module type, because that module type specifies 'a list
throughout it to represent the stack—not 'a list * int
.
module CheckListStackCachedSize : LIST_STACK = ListStackCachedSize
File "[12]", line 1, characters 47-66:
1 | module CheckListStackCachedSize : LIST_STACK = ListStackCachedSize
^^^^^^^^^^^^^^^^^^^
Error: Signature mismatch:
...
Values do not match:
val empty : 'a list * int
is not included in
val empty : 'a list
The type 'a list * int is not compatible with the type 'b list
File "[10]", line 7, characters 2-21: Expected declaration
File "[11]", line 3, characters 6-11: Actual declaration
Moreover, any code we previously wrote using ListStack
now has to be modified
to deal with the pair, which could mean revising pattern matches, function
types, and so forth.
As you no doubt learned in earlier programming courses, the problem we are
encountering here is a lack of encapsulation. We should have kept the type that
implements ListStack
hidden from clients. In Java, for example, we might have
written:
class ListStack<T> {
private List<T> stack;
private int size;
...
}
That way clients of ListStack
would be unaware of stack
or size
. In fact,
they wouldn’t be able to name those fields at all. Instead, they would
just use ListStack
as the type of the stack:
ListStack<Integer> s = new ListStack<>();
s.push(1);
So in OCaml, how can we keep the representation type of the stack hidden? What
we learned about opacity and sealing thus far does not suffice. The problem is
that the type 'a list * int
literally appears in the signature of
ListStackCachedSize
, e.g., in push
:
ListStackCachedSize.push
- : 'a -> 'a list * int -> 'a list * int = <fun>
A module type annotation could hide one of the values defined in
ListStackCachedSize
, e.g., push
itself, but that doesn’t solve the problem:
we need to hide the type 'a list * int
while exposing the operation
push
. So OCaml has a feature for doing exactly that: abstract types.
Let’s see an example of this feature.
We begin by modifying LIST_STACK
, replacing 'a list
with a new type
'a stack
everywhere. We won’t repeat the specification comments here, so as to
keep the example shorter. And while we’re at it, let’s add the size
operation.
module type LIST_STACK = sig
type 'a stack
exception Empty
val empty : 'a stack
val is_empty : 'a stack -> bool
val push : 'a -> 'a stack -> 'a stack
val peek : 'a stack -> 'a
val pop : 'a stack -> 'a stack
val size : 'a stack -> int
end
Show code cell output
module type LIST_STACK =
sig
type 'a stack
exception Empty
val empty : 'a stack
val is_empty : 'a stack -> bool
val push : 'a -> 'a stack -> 'a stack
val peek : 'a stack -> 'a
val pop : 'a stack -> 'a stack
val size : 'a stack -> int
end
Note how 'a stack
is not actually defined in that signature. We haven’t said
anything about what it is. It might be 'a list
, or 'a list * int
, or
{stack : 'a list; size : int}
, or anything else. That is what makes it an
abstract type: we’ve declared its name but not specified its definition.
Now ListStackCachedSize
can implement that module type with the addition
of just one line of code: the first line of the structure, which defines
'a stack
:
module ListStackCachedSize : LIST_STACK = struct
type 'a stack = 'a list * int
exception Empty
let empty = ([], 0)
let is_empty = function ([], _) -> true | _ -> false
let push x (stack, size) = (x :: stack, size + 1)
let peek = function ([], _) -> raise Empty | (x :: _, _) -> x
let pop = function
| ([], _) -> raise Empty
| (_ :: stack, size) -> (stack, size - 1)
let size = snd
end
module ListStackCachedSize : LIST_STACK
Take a careful look at the output: nowhere does 'a list
show up in it. In
fact, only LIST_STACK
does. And LIST_STACK
mentions only 'a stack
. So no
one’s going to know that internally a list is used. (Ok, they’re going to know:
the name suggests it. But the point is they can’t take advantage of that,
because the type is abstract.)
Likewise, our original implementation with linear-time size
satisfies
the module type. We just have to add a line to define 'a stack
:
module ListStack : LIST_STACK = struct
type 'a stack = 'a list
exception Empty
let empty = []
let is_empty = function [] -> true | _ -> false
let push x s = x :: s
let peek = function [] -> raise Empty | x :: _ -> x
let pop = function [] -> raise Empty | _ :: s -> s
let size = List.length
end
module ListStack : LIST_STACK
Note that omitting that added line would result in an error, just as if
we had failed to define push
or any of the other operations from
the module type:
module ListStack : LIST_STACK = struct
(* type 'a stack = 'a list *)
exception Empty
let empty = []
let is_empty = function [] -> true | _ -> false
let push x s = x :: s
let peek = function [] -> raise Empty | x :: _ -> x
let pop = function [] -> raise Empty | _ :: s -> s
let size = List.length
end
File "[17]", lines 1-10, characters 32-3:
1 | ................................struct
2 | (* type 'a stack = 'a list *)
3 | exception Empty
4 | let empty = []
5 | let is_empty = function [] -> true | _ -> false
6 | let push x s = x :: s
7 | let peek = function [] -> raise Empty | x :: _ -> x
8 | let pop = function [] -> raise Empty | _ :: s -> s
9 | let size = List.length
10 | end
Error: Signature mismatch:
...
The type `stack' is required but not provided
File "[14]", line 2, characters 2-15: Expected declaration
Here is a third, custom implementation of LIST_STACK
. This one is deliberately
overly-complicated, in part to illustrate how the abstract type can hide
implementation details that are better not revealed to clients:
module CustomStack : LIST_STACK = struct
type 'a entry = {top : 'a; rest : 'a stack; size : int}
and 'a stack = S of 'a entry option
exception Empty
let empty = S None
let is_empty = function S None -> true | _ -> false
let size = function S None -> 0 | S (Some {size}) -> size
let push x s = S (Some {top = x; rest = s; size = size s + 1})
let peek = function S None -> raise Empty | S (Some {top}) -> top
let pop = function S None -> raise Empty | S (Some {rest}) -> rest
end
module CustomStack : LIST_STACK
Is that really a “list” stack? It satisfies the module type LIST_STACK
. But
upon reflection, that module type never really had anything to do with lists
once we made the type 'a stack
abstract. There’s really no need to call it
LIST_STACK
. We’d be better off using just STACK
, since it can be implemented
with list
or without. At that point, we could just go with Stack
as its
name, since there is no module named Stack
we’ve written that would be
confused with it. That avoids the all-caps look of our code shouting at us.
module type Stack = sig
type 'a stack
exception Empty
val empty : 'a stack
val is_empty : 'a stack -> bool
val push : 'a -> 'a stack -> 'a stack
val peek : 'a stack -> 'a
val pop : 'a stack -> 'a stack
val size : 'a stack -> int
end
module ListStack : Stack = struct
type 'a stack = 'a list
exception Empty
let empty = []
let is_empty = function [] -> true | _ -> false
let push x s = x :: s
let peek = function [] -> raise Empty | x :: _ -> x
let pop = function [] -> raise Empty | _ :: s -> s
let size = List.length
end
Show code cell output
module type Stack =
sig
type 'a stack
exception Empty
val empty : 'a stack
val is_empty : 'a stack -> bool
val push : 'a -> 'a stack -> 'a stack
val peek : 'a stack -> 'a
val pop : 'a stack -> 'a stack
val size : 'a stack -> int
end
module ListStack : Stack
There’s one further naming improvement we could make. Notice the type of
ListStack.empty
(and don’t worry about the abstr
part for now; we’ll come
back to it):
ListStack.empty
- : 'a ListStack.stack = <abstr>
That type, 'a ListStack.stack
, is rather unwieldy, because it conveys the word
“stack” twice: once in the name of the module, and again in the name of the
representation type inside that module. In places like this, OCaml programmers
idiomatically use a standard name, t
, in place of a longer representation type
name:
module type Stack = sig
type 'a t
exception Empty
val empty : 'a t
val is_empty : 'a t -> bool
val push : 'a -> 'a t -> 'a t
val peek : 'a t -> 'a
val pop : 'a t -> 'a t
val size : 'a t -> int
end
module ListStack : Stack = struct
type 'a t = 'a list
exception Empty
let empty = []
let is_empty = function [] -> true | _ -> false
let push x s = x :: s
let peek = function [] -> raise Empty | x :: _ -> x
let pop = function [] -> raise Empty | _ :: s -> s
let size = List.length
end
module CustomStack : Stack = struct
type 'a entry = {top : 'a; rest : 'a t; size : int}
and 'a t = S of 'a entry option
exception Empty
let empty = S None
let is_empty = function S None -> true | _ -> false
let size = function S None -> 0 | S (Some {size}) -> size
let push x s = S (Some {top = x; rest = s; size = size s + 1})
let peek = function S None -> raise Empty | S (Some {top}) -> top
let pop = function S None -> raise Empty | S (Some {rest}) -> rest
end
Show code cell output
module type Stack =
sig
type 'a t
exception Empty
val empty : 'a t
val is_empty : 'a t -> bool
val push : 'a -> 'a t -> 'a t
val peek : 'a t -> 'a
val pop : 'a t -> 'a t
val size : 'a t -> int
end
module ListStack : Stack
module CustomStack : Stack
Now the type of stacks is simpler:
ListStack.empty;;
CustomStack.empty;;
- : 'a ListStack.t = <abstr>
- : 'a CustomStack.t = <abstr>
That idiom is fairly common when there’s a single representation type exposed by an interface to a data structure. You’ll see it used throughout the standard library.
In informal conversation we would usually pronounce those types without the “dot
t” part. For example, we might say “alpha ListStack”, simply ignoring the
t
—though it does technically have to be there to be legal OCaml code.
Finally, abstract types are really just a special case of opacity. You actually can expose the definition of a type in a signature if you want to:
module type T = sig
type t = int
val x : t
end
module M : T = struct
type t = int
let x = 42
end
let a : int = M.x
module type T = sig type t = int val x : t end
module M : T
val a : int = 42
Note how we’re able to use M.x
at its type of int
. That works because
the equality of types t
and int
has been exposed in the module type.
But if we kept t
abstract, the same usage would fail:
module type T = sig
type t (* = int *)
val x : t
end
module M : T = struct
type t = int
let x = 42
end
let a : int = M.x
module type T = sig type t val x : t end
module M : T
File "[24]", line 11, characters 14-17:
11 | let a : int = M.x
^^^
Error: This expression has type M.t but an expression was expected of type
int
We’re not allowed to use M.x
at type int
outside of M
, because its type
M.t
is abstract. This is encapsulation at work, keeping that implementation
detail hidden.
5.4.3. Pretty Printing#
In some output above, we observed something curious: the toplevel prints
<abstr>
in place of the actual contents of a value whose type is abstract:
ListStack.empty;;
ListStack.(empty |> push 1 |> push 2);;
- : 'a ListStack.t = <abstr>
- : int ListStack.t = <abstr>
Recall that the toplevel uses this angle-bracket convention to indicate an
unprintable value. We’ve encountered that before with functions and <fun>
:
fun x -> x
- : 'a -> 'a = <fun>
On the one hand, it’s reasonable for the toplevel to behave this way. Once a
type is abstract, its implementation isn’t meant to be revealed to clients. So
actually printing out the list []
or [2; 1]
as responses to the above inputs
would be revealing more than is intended.
On the other hand, it’s also reasonable for implementers to provide clients with
a friendly way to view a value of an abstract type. Java programmers, for
example, will often write toString()
methods so that objects can be printed
as output in the terminal or in JShell. To support that, the OCaml toplevel
has a directive #install_printer
, which registers a function to print values.
Here’s how it works.
You write a pretty printing function of type
Format.formatter -> t -> unit
, for whatever typet
you like. Let’s suppose for sake of example that you name that functionpp
.You invoke
#install_printer pp
in the toplevel.From now on, anytime the toplevel wants to print a value of type
t
it uses your functionpp
to do so.
It probably makes sense the pretty printing function needs to take in a value of
type t
(because that’s what it needs to print) and returns unit
(as other
printing functions do). But why does it take the Format.formatter
argument?
It’s because of a fairly high-powered feature that OCaml is attempting to
provide here: automatic line breaking and indentation in the middle of very
large outputs.
Consider the output from this expression, which creates nested lists:
List.init 15 (fun n -> List.init n (Fun.const n))
- : int list list =
[[]; [1]; [2; 2]; [3; 3; 3]; [4; 4; 4; 4]; [5; 5; 5; 5; 5];
[6; 6; 6; 6; 6; 6]; [7; 7; 7; 7; 7; 7; 7]; [8; 8; 8; 8; 8; 8; 8; 8];
[9; 9; 9; 9; 9; 9; 9; 9; 9]; [10; 10; 10; 10; 10; 10; 10; 10; 10; 10];
[11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11];
[12; 12; 12; 12; 12; 12; 12; 12; 12; 12; 12; 12];
[13; 13; 13; 13; 13; 13; 13; 13; 13; 13; 13; 13; 13];
[14; 14; 14; 14; 14; 14; 14; 14; 14; 14; 14; 14; 14; 14]]
Each inner list contains n
copies of the number n
. Note how the indentation
and line breaks are somewhat sophisticated. All the inner lists are indented one
space from the left-hand margin. Line breaks have been inserted to avoid
splitting inner lists over multiple lines.
The Format
module is what provides this functionality, and Format.formatter
is an abstract type in it. You could think of a formatter as being a place to
send output, like a file, and have it be automatically formatted along the way.
The typical use of a formatter is as argument to a function such as
Format.fprintf
, which like Printf
uses format specifiers.
For example, suppose you wanted to change how strings are printed by the toplevel and add ” kupo” to the end of each string. Here’s code that would do it:
let kupo_pp fmt s = Format.fprintf fmt "%s kupo" s;;
#install_printer kupo_pp;;
val kupo_pp : Format.formatter -> string -> unit = <fun>
Now you can see that the toplevel adds ” kupo” to each string while printing it, even though it’s not actual a part of the original string:
let h = "Hello"
let s = String.length h
val h : string = Hello kupo
val s : int = 5
To keep ourselves from getting confused about strings in the rest of this section, let’s uninstall that pretty printer before going on:
#remove_printer kupo_pp;;
As a bigger example, let’s add pretty printing to ListStack
:
module type Stack = sig
type 'a t
exception Empty
val empty : 'a t
val is_empty : 'a t -> bool
val push : 'a -> 'a t -> 'a t
val peek : 'a t -> 'a
val pop : 'a t -> 'a t
val size : 'a t -> int
val pp :
(Format.formatter -> 'a -> unit) -> Format.formatter -> 'a t -> unit
end
Show code cell output
module type Stack =
sig
type 'a t
exception Empty
val empty : 'a t
val is_empty : 'a t -> bool
val push : 'a -> 'a t -> 'a t
val peek : 'a t -> 'a
val pop : 'a t -> 'a t
val size : 'a t -> int
val pp :
(Format.formatter -> 'a -> unit) -> Format.formatter -> 'a t -> unit
end
First, notice that we have to expose pp
as part of the module type.
Otherwise it would be encapsulated, hence we wouldn’t be able to install it.
Second, notice that the type of pp
now takes an extra first argument
of type Format.formatter -> 'a -> unit
. That is itself a pretty printer
for type 'a
, on which t
is parameterized. We need that argument in order
to be able to pretty print the values of type 'a
.
module ListStack : Stack = struct
type 'a t = 'a list
exception Empty
let empty = []
let is_empty = function [] -> true | _ -> false
let push x s = x :: s
let peek = function [] -> raise Empty | x :: _ -> x
let pop = function [] -> raise Empty | _ :: s -> s
let size = List.length
let pp pp_val fmt s =
let open Format in
let pp_break fmt () = fprintf fmt "@," in
fprintf fmt "@[<v 0>top of stack";
if s <> [] then fprintf fmt "@,";
pp_print_list ~pp_sep:pp_break pp_val fmt s;
fprintf fmt "@,bottom of stack@]"
end
Show code cell output
module ListStack : Stack
In ListStack.pp
, we use some of the advanced features of the Format
module.
Function Format.pp_print_list
does the heavy lifting to print all the elements
of the stack. The rest of the code handles the indentation and line breaks.
Here’s the result:
#install_printer ListStack.pp
ListStack.empty
- : 'a ListStack.t = top of stack
bottom of stack
ListStack.(empty |> push 1 |> push 2)
- : int ListStack.t = top of stack
2
1
bottom of stack
For more information, see the toplevel manual (search for
#install_printer
), the Format module, and this
OCaml GitHub issue. The latter seems to be the only place that
documents the use of extra arguments, as in pp_val
above, to print values of
polymorphic types.