Rambles around computer science

Diverting trains of thought, wasting precious time

Wed, 30 Sep 2015

Project suggestion: an observable OCaml, using liballocs

It's the season to suggest student projects, and this post is about one which follows up a series of blog posts (1, 2, 3, 4). Feel free to see also my suggestions from last year, another from last year and other standing suggestions

This post is about a suggested project which will produce a working implementation of the OCaml programming language, using an existing front-end but a very different back-end.

Unlike the mainstream implementation, this implementation will be optimized for debuggability, or more generally “observability”. The compilation strategy will be to translate to C, hence obtaining the debuggability (and, to some as-yet-unknown extent, the performance) of the C compilation toolchain.

This means at least three key differences. Firstly, all locally-bound names will map to C local variables. Secondly, OCaml's lexical closures will be implemented in a highly compatible way, perhaps using the technique of Breuel (which is already implemented in the GNU C compiler, but with stack allocation, whose lifetime semantics are not appropriate for OCaml). Thirdly, all allocations will have associated type metadata at run time, using liballocs This will allow parametric polymorphism to be “seen through” by a debugger—so it can tell you, for example, that some stack frame of a function whose type is 'a -> 'a is actually activated with 'a as int, say. (This may or may not be the right formulation of “seeing through”—stronger and weaker versions are possible; ask me!)

Key ingredients are the following:

Optional “extension”-style challenges might include the following:

Evaluation of the system is mostly via performance—the goal would be to get within a reasonable factor of the OCaml native-code compiler. We can also do some experiments to measure observability of our system. One simple experiment might interrupt both versions of a program at randomly chosen call sites and count how much of the local state we could recover (number of frames in the backtrace, number of locals in those frames; perhaps even doing a diff on their values). Call sites are a good choice because all but one frame of the stack is always stopped at one, and instrumenting the native code so that we can do lock-step breakpointing, by counting call invocations, should not be too difficult (though it might mean disabling inlining, which does affect what is being measured). Usually there'll be no contest, since the original OCaml native compiler doesn't let you observe locally bound values at all. There is, however, a development branch which does, and comparing with the bytecode debugger (ocamldebug) is also a good bet.

[/research] permanent link contact


Powered by blosxom

validate this page