2.3. Expressions#
The primary piece of OCaml syntax is the expression. Just like programs in
imperative languages are primarily built out of commands, programs in
functional languages are primarily built out of expressions. Examples of
expressions include 2+2
and increment 21
.
The OCaml manual has a complete definition of all the expressions in the language. Though that page starts with a rather cryptic overview, if you scroll down, you’ll come to some English explanations. Don’t worry about studying that page now; just know that it’s available for reference.
The primary task of computation in a functional language is to evaluate an
expression to a value. A value is an expression for which there is no
computation remaining to be performed. So, all values are expressions, but not
all expressions are values. Examples of values include 2
, true
, and
"yay!"
.
The OCaml manual also has a definition of all the values, though again, that page is mostly useful for reference rather than study.
Sometimes an expression might fail to evaluate to a value. There are two reasons that might happen:
Evaluation of the expression raises an exception.
Evaluation of the expression never terminates (e.g., it enters an “infinite loop”).
2.3.1. Primitive Types and Values#
The primitive types are the built-in and most basic types: integers, floating-point numbers, characters, strings, and booleans. They will be recognizable as similar to primitive types from other programming languages.
Type int
: Integers. OCaml integers are written as usual: 1
, 2
, etc.
The usual operators are available: +
, -
, *
, /
, and mod
. The latter
two are integer division and modulus:
65 / 60
- : int = 1
65 mod 60
- : int = 5
65 / 0
Exception: Division_by_zero.
Raised by primitive operation at unknown location
Called from Stdlib__Fun.protect in file "fun.ml", line 33, characters 8-15
Re-raised at Stdlib__Fun.protect in file "fun.ml", line 38, characters 6-52
Called from Topeval.load_lambda in file "toplevel/byte/topeval.ml", line 89, characters 4-150
OCaml integers range from \(-2^{62}\) to \(2^{62}-1\) on modern platforms. They are
implemented with 64-bit machine words, which is the size of a register on
64-bit processor. But one of those bits is “stolen” by the OCaml implementation,
leading to a 63-bit representation. That bit is used at run time to distinguish
integers from pointers. For applications that need true 64-bit integers, there
is an Int64
module in the standard library. And for applications that
need arbitrary-precision integers, there is a separate Zarith
library. But for most purposes, the built-in int
type suffices and offers the
best performance.
Type float
: Floating-point numbers. OCaml floats are IEEE 754
double-precision floating-point numbers. Syntactically, they must
always contain a dot—for example, 3.14
or 3.0
or even 3.
. The last
is a float
; if you write it as 3
, it is instead an int
:
3.
- : float = 3.
3
- : int = 3
OCaml deliberately does not support operator overloading, Arithmetic operations
on floats are written with a dot after them. For example, floating-point
multiplication is written *.
not *
:
3.14 *. 2.
- : float = 6.28
3.14 * 2.
File "[7]", line 1, characters 0-4:
1 | 3.14 * 2.
^^^^
Error: This expression has type float but an expression was expected of type
int
OCaml will not automatically convert between int
and float
. If you want to
convert, there are two built-in functions for that purpose: int_of_float
and
float_of_int
.
3.14 *. (float_of_int 2)
- : float = 6.28
As in any language, the floating-point representation is approximate. That can lead to rounding errors:
0.1 +. 0.2
- : float = 0.300000000000000044
The same behavior can be observed in Python and Java, too. If you haven’t encountered this phenomenon before, here’s a basic guide to floating-point representation that you might enjoy reading.
Type bool
: Booleans. The boolean values are written true
and false
.
The usual short-circuit conjunction &&
and disjunction ||
operators are
available.
Type char
: Characters. Characters are written with single quotes, such as
'a'
, 'b'
, and 'c'
. They are represented as bytes —that is, 8-bit
integers— in the ISO 8859-1 “Latin-1” encoding. The first half of the
characters in that range are the standard ASCII characters. You can convert
characters to and from integers with char_of_int
and int_of_char
.
Type string
: Strings. Strings are sequences of characters. They are
written with double quotes, such as "abc"
. The string concatenation operator
is ^
:
"abc" ^ "def"
- : string = "abcdef"
Object-oriented languages often provide an overridable method for converting
objects to strings, such as toString()
in Java or __str__()
in Python. But
most OCaml values are not objects, so another means is required to convert to
strings. For three of the primitive types, there are built-in functions:
string_of_int
, string_of_float
, string_of_bool
. Strangely,
there is no string_of_char
, but the library function String.make
can
be used to accomplish the same goal.
string_of_int 42
- : string = "42"
String.make 1 'z'
- : string = "z"
Likewise, for the same three primitive types, there are built-in functions to
convert from a string if possible: int_of_string
, float_of_string
, and
bool_of_string
.
int_of_string "123"
- : int = 123
int_of_string "not an int"
Exception: Failure "int_of_string".
Raised by primitive operation at unknown location
Called from Stdlib__Fun.protect in file "fun.ml", line 33, characters 8-15
Re-raised at Stdlib__Fun.protect in file "fun.ml", line 38, characters 6-52
Called from Topeval.load_lambda in file "toplevel/byte/topeval.ml", line 89, characters 4-150
There is no char_of_string
, but the individual characters of a string can be
accessed by a 0-based index. The indexing operator is written with a dot and
square brackets:
"abc".[0]
- : char = 'a'
"abc".[1]
- : char = 'b'
"abc".[3]
Exception: Invalid_argument "index out of bounds".
Raised by primitive operation at unknown location
Called from Stdlib__Fun.protect in file "fun.ml", line 33, characters 8-15
Re-raised at Stdlib__Fun.protect in file "fun.ml", line 38, characters 6-52
Called from Topeval.load_lambda in file "toplevel/byte/topeval.ml", line 89, characters 4-150
2.3.2. More Operators#
We’ve covered most of the built-in operators above, but there are a few more that you can see in the OCaml manual.
There are two equality operators in OCaml, =
and ==
, with corresponding
inequality operators <>
and !=
. Operators =
and <>
examine structural
equality whereas ==
and !=
examine physical equality. Until we’ve studied
the imperative features of OCaml, the difference between them will be tricky to
explain. See the documentation of Stdlib.(==)
if you’re curious now.
Important
Start training yourself now to use =
and not to use ==
. This will be
difficult if you’re coming from a language like Java where ==
is the usual
equality operator.
2.3.3. Assertions#
The expression assert e
evaluates e
. If the result is true
, nothing more
happens, and the entire expression evaluates to a special value called unit.
The unit value is written ()
and its type is unit
. But if the result is
false
, an exception is raised.
One way to test a function f
is to write a series of assertions like this:
let () = assert (f input1 = output1)
let () = assert (f input2 = output2)
let () = assert (f input3 = output3)
Those assert that f input1
should be output1
, and so forth. The
let () = ...
part of those is used to handle the unit value returned by each
assertion.
2.3.4. If Expressions#
The expression if e1 then e2 else e3
evaluates to e2
if e1
evaluates to
true
, and to e3
otherwise. We call e1
the guard of the if
expression.
if 3 + 5 > 2 then "yay!" else "boo!"
- : string = "yay!"
Unlike if-then-else
statements that you may have used in imperative
languages, if-then-else
expressions in OCaml are just like any other
expression; they can be put anywhere an expression can go. That makes them
similar to the ternary operator ? :
that you might have used in other
languages.
4 + (if 'a' = 'b' then 1 else 2)
- : int = 6
If
expressions can be nested in a pleasant way:
if e1 then e2
else if e3 then e4
else if e5 then e6
...
else en
You should regard the final else
as mandatory, regardless of whether you are
writing a single if
expression or a highly nested if
expression. If you
omit it you’ll likely get an error message that, for now, is inscrutable:
if 2 > 3 then 5
File "[20]", line 1, characters 14-15:
1 | if 2 > 3 then 5
^
Error: This expression has type int but an expression was expected of type
unit
because it is in the result of a conditional with no else branch
Syntax. The syntax of an if
expression:
if e1 then e2 else e3
The letter e
is used here to represent any other OCaml expression; it’s an
example of a syntactic variable aka metavariable, which is not actually a
variable in the OCaml language itself, but instead a name for a certain
syntactic construct. The numbers after the letter e
are being used to
distinguish the three different occurrences of it.
Dynamic semantics. The dynamic semantics of an if
expression:
If
e1
evaluates totrue
, and ife2
evaluates to a valuev
, thenif e1 then e2 else e3
evaluates tov
If
e1
evaluates tofalse
, and ife3
evaluates to a valuev
, thenif e1 then e2 else e3
evaluates tov
.
We call these evaluation rules: they define how to evaluate expressions. Note
how it takes two rules to describe the evaluation of an if
expression, one for
when the guard is true, and one for when the guard is false. The letter v
is
used here to represent any OCaml value; it’s another example of a metavariable.
Later we will develop a more mathematical way of expressing dynamic semantics,
but for now we’ll stick with this more informal style of explanation.
Static semantics. The static semantics of an if
expression:
If
e1
has typebool
ande2
has typet
ande3
has typet
thenif e1 then e2 else e3
has typet
We call this a typing rule: it describes how to type check an expression. Note
how it only takes one rule to describe the type checking of an if
expression.
At compile time, when type checking is done, it makes no difference whether the
guard is true or false; in fact, there’s no way for the compiler to know what
value the guard will have at run time. The letter t
here is used to represent
any OCaml type; the OCaml manual also has definition of all types
(which curiously does not name the base types of the language like int
and
bool
).
We’re going to be writing “has type” a lot, so let’s introduce a more compact
notation for it. Whenever we would write “e
has type t
”, let’s instead write
e : t
. The colon is pronounced “has type”. This usage of colon is consistent
with how the toplevel responds after it evaluates an expression that you enter:
let x = 42
val x : int = 42
In the above example, variable x
has type int
, which is what the colon
indicates.
2.3.5. Let Expressions#
In our use of the word let
thus far, we’ve been making definitions in the
toplevel and in .ml
files. For example,
let x = 42;;
val x : int = 42
defines x
to be 42, after which we can use x
in future definitions at the
toplevel. We’ll call this use of let
a let definition.
There’s another use of let
which is as an expression:
let x = 42 in x + 1
- : int = 43
Here we’re binding a value to the name x
then using that binding inside
another expression, x+1
. We’ll call this use of let
a let expression.
Since it’s an expression, it evaluates to a value. That’s different than
definitions, which themselves do not evaluate to any value. You can see that if
you try putting a let definition in place of where an expression is expected:
(let x = 42) + 1
File "[24]", line 1, characters 11-12:
1 | (let x = 42) + 1
^
Error: Syntax error
Syntactically, a let
definition is not permitted on the left-hand side of the
+
operator, because a value is needed there, and definitions do not evaluate
to values. On the other hand, a let
expression would work fine:
(let x = 42 in x) + 1
- : int = 43
Another way to understand let definitions at the toplevel is that they are like let expression where we just haven’t provided the body expression yet. Implicitly, that body expression is whatever else we type in the future. For example,
# let a = "big";;
# let b = "red";;
# let c = a ^ b;;
# ...
is understood by OCaml in the same way as
let a = "big" in
let b = "red" in
let c = a ^ b in
...
That latter series of let
bindings is idiomatically how several variables
can be bound inside a given block of code.
Syntax.
let x = e1 in e2
As usual, x
is an identifier. These identifiers must begin with lower-case,
not upper, and idiomatically are written with snake_case
not camelCase
. We
call e1
the binding expression, because it’s what’s being bound to x
; and
we call e2
the body expression, because that’s the body of code in which the
binding will be in scope.
Dynamic semantics.
To evaluate let x = e1 in e2
:
Evaluate
e1
to a valuev1
.Substitute
v1
forx
ine2
, yielding a new expressione2'
.Evaluate
e2'
to a valuev2
.The result of evaluating the let expression is
v2
.
Here’s an example:
let x = 1 + 4 in x * 3
--> (evaluate e1 to a value v1)
let x = 5 in x * 3
--> (substitute v1 for x in e2, yielding e2')
5 * 3
--> (evaluate e2' to v2)
15
(result of evaluation is v2)
Static semantics.
If
e1 : t1
and if under the assumption thatx : t1
it holds thate2 : t2
, then(let x = e1 in e2) : t2
.
We use the parentheses above just for clarity. As usual, the compiler’s type inferencer determines what the type of the variable is, or the programmer could explicitly annotate it with this syntax:
let x : t = e1 in e2
2.3.6. Scope#
Let
bindings are in effect only in the block of code in which they occur. This
is exactly what you’re used to from nearly any modern programming language. For
example:
let x = 42 in
(* y is not meaningful here *)
x + (let y = "3110" in
(* y is meaningful here *)
int_of_string y)
The scope of a variable is where its name is meaningful. Variable y
is in
scope only inside of the let
expression that binds it above.
It’s possible to have overlapping bindings of the same name. For example:
let x = 5 in
((let x = 6 in x) + x)
But this is darn confusing, and for that reason, it is strongly discouraged style—much like ambiguous pronouns are discouraged in natural language. Nonetheless, let’s consider what that code means.
To what value does that code evaluate? The answer comes down to how x
is
replaced by a value each time it occurs. Here are a few possibilities for such
substitution:
(* possibility 1 *)
let x = 5 in
((let x = 6 in 6) + 5)
(* possibility 2 *)
let x = 5 in
((let x = 6 in 5) + 5)
(* possibility 3 *)
let x = 5 in
((let x = 6 in 6) + 6)
The first one is what nearly any reasonable language would do. And most likely it’s what you would guess But, why?
The answer is something we’ll call the Principle of Name Irrelevance: the name of a variable shouldn’t intrinsically matter. You’re used to this from math. For example, the following two functions are the same:
It doesn’t intrinsically matter whether we call the argument to the function \(x\) or \(y\); either way, it’s still the squaring function. Therefore, in programs, these two functions should be identical:
let f x = x * x
let f y = y * y
This principle is more commonly known as alpha equivalence: the two functions are equivalent up to renaming of variables, which is also called alpha conversion for historical reasons that are unimportant here.
According to the Principle of Name Irrelevance, these two expressions should be identical:
let x = 6 in x
let y = 6 in y
Therefore, the following two expressions, which have the above expressions embedded in them, should also be identical:
let x = 5 in (let x = 6 in x) + x
let x = 5 in (let y = 6 in y) + x
But for those to be identical, we must choose the first of the three possibilities above. It is the only one that makes the name of the variable be irrelevant.
There is a term commonly used for this phenomenon: a new binding of a variable shadows any old binding of the variable name. Metaphorically, it’s as if the new binding temporarily casts a shadow over the old binding. But eventually the old binding could reappear as the shadow recedes.
Shadowing is not mutable assignment. For example, both of the following expressions evaluate to 11:
let x = 5 in ((let x = 6 in x) + x)
let x = 5 in (x + (let x = 6 in x))
Likewise, the following utop transcript is not mutable assignment, though at first it could seem like it is:
# let x = 42;;
val x : int = 42
# let x = 22;;
val x : int = 22
Recall that every let
definition in the toplevel is effectively a nested let
expression. So the above is effectively the following:
let x = 42 in
let x = 22 in
... (* whatever else is typed in the toplevel *)
The right way to think about this is that the second let
binds an entirely new
variable that just happens to have the same name as the first let
.
Here is another utop transcript that is well worth studying:
# let x = 42;;
val x : int = 42
# let f y = x + y;;
val f : int -> int = <fun>
# f 0;;
: int = 42
# let x = 22;;
val x : int = 22
# f 0;;
- : int = 42 (* x did not mutate! *)
To summarize, each let definition binds an entirely new variable. If that new
variable happens to have the same name as an old variable, the new variable
temporarily shadows the old one. But the old variable is still around, and its
value is immutable: it never, ever changes. So even though let
expressions
might superficially look like assignment statements from imperative languages,
they are actually quite different.
2.3.7. Type Annotations#
OCaml automatically infers the type of every expression, with no need for the programmer to write it manually. Nonetheless, it can sometimes be useful to manually specify the desired type of an expression. A type annotation does that:
(5 : int)
- : int = 5
An incorrect annotation will produce a compile-time error:
(5 : float)
File "[27]", line 1, characters 1-2:
1 | (5 : float)
^
Error: This expression has type int but an expression was expected of type
float
Hint: Did you mean `5.'?
And that example shows why you might use manual type annotations during
debugging. Perhaps you had forgotten that 5
cannot be treated as a float
,
and you tried to write:
5 +. 1.1
You might try manually specifying that 5
was supposed to be a float
:
(5 : float) +. 1.1
File "[28]", line 1, characters 1-2:
1 | (5 : float) +. 1.1
^
Error: This expression has type int but an expression was expected of type
float
Hint: Did you mean `5.'?
It’s clear that the type annotation has failed. Although that might seem silly for this tiny program, you might find this technique to be effective as programs get larger.
Important
Type annotations are not type casts, such as might be found in C or Java. They do not indicate a conversion from one type to another. Rather they indicate a check that the expression really does have the given type.
Syntax. The syntax of a type annotation:
(e : t)
Note that the parentheses are required.
Dynamic semantics. There is no run-time meaning for a type annotation.
It goes away during compilation, because it indicates a compile-time check.
There is no run-time conversion.
So, if (e : t)
compiled successfully, then at run-time it is simply e
,
and it evaluates as e
would.
Static semantics. If e
has type t
then (e : t)
has type t
.