5.8. Includes

Copying and pasting code is almost always a bad idea. Duplication of code causes duplication and proliferation of errors. So why are we so prone to making this mistake? Maybe because it always seems like the easier option — easier and quicker than applying the Abstraction Principle as we should to factor out common code.

The OCaml module system provides a neat feature called includes that is like a principled copy-and-paste that is quick and easy to use, but avoids actual duplication. It can be used to solve some of the same problems as inheritance in object-oriented languages.

Let’s start with an example. Recall this implementation of sets as lists:

module type Set = sig
  type 'a t
  val empty : 'a t
  val mem : 'a -> 'a t -> bool
  val add : 'a -> 'a t -> 'a t
  val elements : 'a t -> 'a list
end

module ListSet : Set = struct
  type 'a t = 'a list
  let empty = []
  let mem = List.mem
  let add = List.cons
  let elements s = List.sort_uniq Stdlib.compare s
end
module type Set =
  sig
    type 'a t
    val empty : 'a t
    val mem : 'a -> 'a t -> bool
    val add : 'a -> 'a t -> 'a t
    val elements : 'a t -> 'a list
  end
module ListSet : Set

Suppose we wanted to add a function of_list : 'a list -> 'a t that could construct a set out of a list. If we had access to the source code of both ListSet and Set, and if we were permitted to modify it, this wouldn’t be hard. But what if they were third-party libraries for which we didn’t have source code?

In Java, we might use inheritance to solve this problem:

interface Set<T> { ... }
class ListSet<T> implements Set<T> { ... }
class ListSetExtended<T> extends ListSet<T> {
  Set<T> ofList(List<T> lst) { ... }
}

That helps us to reuse code, because the subclass inherits all the methods of its superclass.

OCaml includes are similar. They enable a module to include all the items defined by another module, or a module type to include all the specifications of another module type.

Here’s how we can use includes to solve the problem of adding of_list to ListSet:

module ListSetExtended = struct
  include ListSet
  let of_list lst = List.fold_right add lst empty
end
module ListSetExtended :
  sig
    type 'a t = 'a ListSet.t
    val empty : 'a t
    val mem : 'a -> 'a t -> bool
    val add : 'a -> 'a t -> 'a t
    val elements : 'a t -> 'a list
    val of_list : 'a list -> 'a t
  end

This code says that ListSetExtended is a module that includes all the definitions of the ListSet module, as well as a definition of of_list. We don’t have to know the source code implementing ListSet to make this happen.

Note

You might wonder we why can’t simply implement of_list as the identity function. See the section below on encapsulation for the answer.

5.8.1. Semantics of Includes

Includes can be used inside of structures and signatures. When we include inside a signature, we must be including another signature. And when we include inside a structure, we must be including another structure.

Including a structure is effectively just syntactic sugar for writing a local definition for each name defined in the module. Writing include ListSet as we did above, for example, has an effect similar to writing the following:

module ListSetExtended = struct
  (* BEGIN all the includes *)
  type 'a t = 'a ListSet.t
  let empty = ListSet.empty
  let mem = ListSet.mem
  let add = ListSet.add
  let elements = ListSet.elements
  (* END all the includes *)
  let of_list lst = List.fold_right add lst empty
end
module ListSetExtended :
  sig
    type 'a t = 'a ListSet.t
    val empty : 'a ListSet.t
    val mem : 'a -> 'a ListSet.t -> bool
    val add : 'a -> 'a ListSet.t -> 'a ListSet.t
    val elements : 'a ListSet.t -> 'a list
    val of_list : 'a list -> 'a ListSet.t
  end

None of that is actually copying the source code of ListSet. Rather, the include just creates a new definition in ListSetExtended with the same name as each definition in ListSet. But if the set of names defined inside ListSet ever changed, the include would reflect that change, whereas a copy-paste job would not.

Including a signature is much the same. For example, we could write:

module type SetExtended = sig
  include Set
  val of_list : 'a list -> 'a t
end
module type SetExtended =
  sig
    type 'a t
    val empty : 'a t
    val mem : 'a -> 'a t -> bool
    val add : 'a -> 'a t -> 'a t
    val elements : 'a t -> 'a list
    val of_list : 'a list -> 'a t
  end

Which would have an effect similar to writing the following:

module type SetExtended = sig
  (* BEGIN all the includes *)
  type 'a t
  val empty : 'a t
  val mem : 'a -> 'a t -> bool
  val add : 'a -> 'a t -> 'a t
  val elements  : 'a t -> 'a list
  (* END all the includes *)
  val of_list : 'a list -> 'a t
end
module type SetExtended =
  sig
    type 'a t
    val empty : 'a t
    val mem : 'a -> 'a t -> bool
    val add : 'a -> 'a t -> 'a t
    val elements : 'a t -> 'a list
    val of_list : 'a list -> 'a t
  end

That module type would be suitable for ListSetExtended:

module ListSetExtended : SetExtended = struct
  include ListSet
  let of_list lst = List.fold_right add lst empty
end
module ListSetExtended : SetExtended

5.8.2. Encapsulation and Includes

We mentioned above that you might wonder why we didn’t write this simpler definition of of_list:

module ListSetExtended : SetExtended = struct
  include ListSet
  let of_list lst = lst
end
File "[7]", lines 1-4, characters 39-3:
1 | .......................................struct
2 |   include ListSet
3 |   let of_list lst = lst
4 | end
Error: Signature mismatch:
       ...
       Values do not match:
         val of_list : 'a -> 'a
       is not included in
         val of_list : 'a list -> 'a t
       File "[5]", line 9, characters 2-31: Expected declaration
       File "[7]", line 3, characters 6-13: Actual declaration

Check out that error message. It looks like of_list doesn’t have the right type. What if we try adding some type annotations?

module ListSetExtended : SetExtended = struct
  include ListSet
  let of_list (lst : 'a list) : 'a t = lst
end
File "[8]", line 3, characters 39-42:
3 |   let of_list (lst : 'a list) : 'a t = lst
                                           ^^^
Error: This expression has type 'a list
       but an expression was expected of type 'a t = 'a ListSet.t

Ah, now the problem is clearer: in the body of of_list, the equality of 'a t and 'a list isn’t known. In ListSetExtended, we do know that 'a t = 'a ListSet.t, because that’s what the include gave us. But the fact that 'a ListSet.t = 'a list was hidden when ListSet was sealed at module type Set. So, includes must obey encapsulation, just like the rest of the module system.

One workaround is to rewrite the definitions as follows:

module ListSetImpl = struct
  type 'a t = 'a list
  let empty = []
  let mem = List.mem
  let add = List.cons
  let elements s = List.sort_uniq Stdlib.compare s
end

module ListSet : Set = ListSetImpl

module type SetExtended = sig
  include Set
  val of_list : 'a list -> 'a t
end

module ListSetExtendedImpl = struct
  include ListSetImpl
  let of_list lst = lst
end

module ListSetExtended : SetExtended = ListSetExtendedImpl
module ListSetImpl :
  sig
    type 'a t = 'a list
    val empty : 'a list
    val mem : 'a -> 'a list -> bool
    val add : 'a -> 'a list -> 'a list
    val elements : 'a list -> 'a list
  end
module ListSet : Set
module type SetExtended =
  sig
    type 'a t
    val empty : 'a t
    val mem : 'a -> 'a t -> bool
    val add : 'a -> 'a t -> 'a t
    val elements : 'a t -> 'a list
    val of_list : 'a list -> 'a t
  end
module ListSetExtendedImpl :
  sig
    type 'a t = 'a list
    val empty : 'a list
    val mem : 'a -> 'a list -> bool
    val add : 'a -> 'a list -> 'a list
    val elements : 'a list -> 'a list
    val of_list : 'a -> 'a
  end
module ListSetExtended : SetExtended

The important change is that ListSetImpl is not sealed, so its type 'a t is not abstract. When we include it in ListSetExtended, we can therefore exploit the fact that it’s a synonym for 'a list.

What we just did is effectively the same as what Java does to handle the visibility modifiers public, private, etc. The “private version” of a class is like the Impl version above: anyone who can see that version gets to see all the exposed items (fields in Java, types in OCaml), without any encapsulation. The “public version” of a class is like the sealed version above: anyone who can see that version is forced to treat the items as abstract, hence encapsulated.

With that technique, if we want to provide a new implementation of one of the included functions we could do that too:

module ListSetExtendedImpl = struct
  include ListSetImpl
  let of_list lst = List.fold_right add lst empty
  let rec elements = function
    | [] -> []
    | h :: t -> if mem h t then elements t else h :: elements t
end
module ListSetExtendedImpl :
  sig
    type 'a t = 'a list
    val empty : 'a list
    val mem : 'a -> 'a list -> bool
    val add : 'a -> 'a list -> 'a list
    val of_list : 'a list -> 'a list
    val elements : 'a list -> 'a list
  end

But that’s a bad idea. First, it’s actually a quadratic implementation of elements instead of linearithmic. Second, it does not replace the original implementation of elements. Remember the semantics of modules: all definitions are evaluated from top to bottom, in order. So the new definition of elements above won’t come into use until the very end of evaluation. If any earlier functions had happened to use elements as a helper, they would use the original linearithmic version, not the new quadratic version.

Warning

This differs from what you might expect from Java, which uses a language feature called dynamic dispatch to figure out which method implementation to invoke. Dynamic dispatch is arguably the defining feature of object-oriented languages. OCaml functions are not methods, and they do not use dynamic dispatch.

5.8.3. Include vs. Open

The include and open statements are quite similar, but they have a subtly different effect on a structure. Consider this code:

module M = struct
  let x = 0
end

module N = struct
  include M
  let y = x + 1
end

module O = struct
  open M
  let y = x + 1
end
module M : sig val x : int end
module N : sig val x : int val y : int end
module O : sig val y : int end

Look closely at the values contained in each structure. N has both an x and y, whereas O has only a y. The reason is that include M causes all the definitions of M to also be included in N, so the definition of x from M is present in N. But open M only made those definitions available in the scope of O; it doesn’t actually make them part of the structure. So O does not contain a definition of x, even though x is in scope during the evaluation of O’s definition of y.

A metaphor for understanding this difference might be: open M imports definitions from M and makes them available for local consumption, but they aren’t exported to the outside world. Whereas include M imports definitions from M, makes them available for local consumption, and additionally exports them to the outside world.

5.8.4. Including Code in Multiple Modules

Recall that we also had an implementation of sets that made sure every element of the underlying list was unique:

module UniqListSet : Set = struct
  (** All values in the list must be unique. *)
  type 'a t = 'a list
  let empty = []
  let mem = List.mem
  let add x s = if mem x s then s else x :: s
  let elements = Fun.id
end
module UniqListSet : Set

Suppose we wanted add of_list to that module too. One possibility would be to copy and paste that function from ListSet into UniqListSet. But that’s poor software engineering. So let’s rule that out right away as a non-solution.

Instead, suppose we try to define the function outside of either module:

let of_list lst = List.fold_right add lst empty
File "[13]", line 1, characters 34-37:
1 | let of_list lst = List.fold_right add lst empty
                                      ^^^
Error: Unbound value add

The problem is we either need to choose which module’s add and empty we want. But as soon as we do, the function becomes useful only with that one module:

let of_list lst = List.fold_right ListSet.add lst ListSet.empty
val of_list : 'a list -> 'a ListSet.t = <fun>

We could make add and empty be parameters instead:

let of_list' add empty lst = List.fold_right add lst empty

let of_list lst = of_list' ListSet.add ListSet.empty lst
let of_list_uniq lst = of_list' UniqListSet.add UniqListSet.empty lst
val of_list' : ('a -> 'b -> 'b) -> 'b -> 'a list -> 'b = <fun>
val of_list : 'a list -> 'a ListSet.t = <fun>
val of_list_uniq : 'a list -> 'a UniqListSet.t = <fun>

But this is annoying in a couple of ways. First, we have to remember which function name to call, whereas all the other operations that are part of those modules have the same name, regardless of which module they’re in. Second, the of_list functions live outside either module, so clients who open one of the modules won’t automatically get the ability to name those functions.

Let’s try to use includes to solve this problem. First, we write a module that contains the parameterized implementation:

module SetOfList = struct
  let of_list' add empty lst = List.fold_right add lst empty
end
module SetOfList :
  sig val of_list' : ('a -> 'b -> 'b) -> 'b -> 'a list -> 'b end

Then we include that module to get the helper function:

module UniqListSetExtended : SetExtended = struct
  include UniqListSet
  include SetOfList
  let of_list lst = of_list' add empty lst
end

module ListSetExtended : SetExtended = struct
  include ListSet
  include SetOfList
  let of_list lst = of_list' add empty lst
end
module UniqListSetExtended : SetExtended
module ListSetExtended : SetExtended

That works, but we’ve only partially succeeded in achieving code reuse:

  • On the positive side, the code that implements of_list' has been factored out into a single location and reused in the two structures.

  • But on the negative side, we still had to write an implementation of of_list in both modules. Worse yet, those implementations are identical. So there’s still code duplication occurring.

Could we do better? Yes. And that leads us to functors, next.