Extension points (also known as “
-ppx syntax extensions”) is the new API for syntactic
extensions in OCaml. The old API, known as camlp4, is very flexible, but also
huge, practically undocumented, lagging behind the newly introduced syntax in the compiler,
and just overall confusing to those attempting to use it.
Extension points are an excellent and very simple replacement introduced by Alain Frisch. In this article, I will explain how to amend OCaml’s syntax using the extension points API.
Extension points are first released in OCaml 4.02. You will need to switch to 4.02 or
a newer compiler, preferably using
What is Camlp4?
At its core, camlp4 (P4 stands for Pre-Processor-Pretty-Printer) is a parsing library which provides extensible grammars. That is, it makes possible to define a parser and then, later, make a derived parser by adding a few rules to the original one. The OCaml syntax (two OCaml syntaxes, in fact, the original one and a revised one introduced specifically for camlp4) is just a special case.
When using camlp4 syntax extensions with OCaml, you write your program in a syntax which
is not compatible with OCaml’s (neither original nor revised one). Then, the OCaml compiler
(when invoked with the
-pp switch) passes the original source to the preprocessor as text;
when the preprocessor has finished its work, it prints back valid OCaml code.
There are a lot of problems with this approach:
It is confusing to users. Camlp4 preprocessors can define almost any imaginable syntax, so unless one is also familiar with all the preprocessors used, it is not in general possible to understand the source.
Writing camlp4 extensions is hard. It requires learning a new (revised) syntax and a complex, scarcely documented API (try
module M = Camlp4;;in utop—the signature is 16255 lines long. Yes, sixteen thousand.)
It is not well-suited for type-driven code generation, which is probably the most common use case for syntax extensions, because it is hard to make different camlp4 extensions cooperate; type_conv was required to enable this functionality.
Last but not the least, using camlp4 prevents OCaml compiler from printing useful suggestions in error messages like
File "ifdef.ml", line 17: This '(' might be unmatched. Personally, I find that very annoying.
What is the extension points API?
The extension points API is much simpler:
A syntax extension is now a function that maps an OCaml AST to an OCaml AST. Correspondingly, it is no longer possible to extend syntax in arbitrary ways.
To make syntax extensions useful for type-driven code generation (like type_conv), the OCaml syntax is enriched with attributes.
Attributes can be attached to pretty much any interesting syntactic construct: expressions, types, variant constructors, fields, modules, etc. By default, attributes are ignored by the OCaml compiler.
Attributes can contain a structure, expression or pattern as their payload, allowing a very wide range of behavior.
For example, one could implement a syntax extension that would accept type declarations of form
type t = A [@id 1] | B [@id 4] of int [@@id_of]and generate a function mapping a value of type
tto its integer representation.
To make syntax extensions useful for implementing custom syntactic constructs, especially for control flow (like pa_lwt), the OCaml syntax is enriched with extension nodes.
Extension nodes designate a custom, incompatible variant of an existing syntactic construct. They’re only available for expression constructs:
ifand so on. When the OCaml compiler encounters an extension node, it signals an error.
Extension nodes have the same payloads as attributes.
For example, one could implement a syntax extension what would accept a let binding of form
let%lwt (x, y) = f in x + yand translate them to
Lwt.bind f (fun (x, y) -> x + y).
To make it possible to insert fragments of code written in entirely unrelated syntax into OCaml code, the OCaml syntax is enriched with quoted strings.
Quoted strings are simply strings delimited with
<delim>is a (possibly empty) sequence of lowercase letters. They behave just like regular OCaml strings, except that syntactic extensions may extract the delimiter.
Using the extension points API
On a concrete level, a syntax extension is an executable that receives a marshalled
OCaml AST and emits a marshalled OCaml AST. The OCaml compiler now also accepts
-ppx option, specifying one or more extensions to preprocess the code with.
To aid this, the internals of the OCaml compiler are now exported as the standard
compiler-libs. This package, among other things, contains
the interface defining the OCaml AST (modules Asttypes and Parsetree) and
a set of helpers for writing the syntax extensions (modules Ast_mapper and
I won’t describe the API in detail; it’s well-documented and nearly trivial (especially when compared with camlp4). Rather, I will describe all the necessary plumbing one needs around an AST-mapping function to turn it into a conveniently packaged extension.
It is possible, but extremely inconvenient, to pattern-match and construct the OCaml AST manually. The extension points API makes it much easier:
- It provides an
1 2 3 4 5 6 7 8 9 10 11 12 13
default_mapper is a “deep identity” mapper, i.e. it traverses every
node of the AST, but changes nothing.
Together, they provide an easy way to use open recursion, i.e. to only handle the parts of AST which are interesting to you.
It provides a set of helpers in the
Ast_helpermodule which simplify constructing the AST. (Unlike Camlp4, extension points API does not provide code quasiquotation, at least for now.)
Exp.tuple [Exp.constant (Const_int 1); Exp.constant (Const_int 2)]would construct the AST for
(1, 2). While unwieldy, this is much better than elaborating the AST directly.
Finally, it provides an
Ast_mapper.run_mainfunction, which handles the command line arguments and I/O.
It is not very convenient to construct and deconstruct ASTs directly. To avoid this, the ppx_tools library provides AST quasiquotation: it allows to embed AST fragments as literals inside the source code.
For example, it is possible to construct an expression using
[%expr 2 + 2], inject
a sub-AST from a variable into an expression with
[%expr 2 + [%e number]], and
even match over ASTs using
match expr with [%expr [%e? lhs] + [%e? rhs]] -> lhs, rhs.
ppx_tools also provides a rewriter tool that allows to test your syntax extension by feeding it source code fragments without using the somewhat awkward debugging options that the OCaml compiler provides.
See the ppx_tools README for further information.
Let’s assemble it all together to make a simple extension that replaces
with the compile-time contents of the variable
First, let’s take a look at the AST that
[%getenv "<var>"] would parse to. To do this,
invoke the OCaml compiler as
ocamlc -dparsetree foo.ml:
1 2 3 4 5 6 7 8 9 10 11 12
As you can see, the grammar category we need is “expression”, so we need to
expr field of the
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
The sample code also demonstrates how to report errors from the extension.
This syntax extension can be easily compiled e.g. with
ocamlbuild -package compiler-libs.common ppx_getenv.native.
You can verify that this produces the desirable result by asking OCaml to pretty-print
the transformed source with
ocamlc -dsource -ppx ./ppx_getenv.native foo.ml, or,
ppx_tools is installed,
ocamlfind ppx_tools/rewriter ./ppx_getenv.native foo.ml:
The OASIS configuration I suggest is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Findlib (ocamlfind) also supports ppx syntax extensions in version 1.5.2
or newer. To use it, add a file called
To use the syntax extension in other OCaml projects, simply require the
ppx_getenv, e.g. as
ocamlfind ocamlc -package ppx_getenv.
This will pass all necessary options to the compiler.
The OPAM documentation nicely explains how to create a package, with instructions fully suitable for OASIS.
Note that ideally, a build system should install a ppx extension under
lib/ppx_getenv and use
ppx = "./ppx_getenv" in the
This is to avoid polluting the global executable namespace with
package-specific executables, and also avoiding name conflicts.
However, OASIS does not make this easy, so in this example the executable
is installed under
The extension points API is ready to be used in applications and is much nicer than camlp4.
If you are writing an extension, you’ll find this material useful:
- Asttypes and Parsetree modules for writing matchers over the AST;
- Ast_helper for generating code;
- Ast_mapper for hooking into the mapper;
- extension_points.txt for a more thorough high-level description of the newly introduced syntax;
- experimental/frisch directory in general for a set of useful examples. Do note that not all of them are always updated to the latest extension points API;
- ocaml-ppx_getenv repository contains example code from this article.