Rambles around computer science

Diverting trains of thought, wasting precious time

Thu, 19 Feb 2009

Initialization order

I just spent an age tracking down a bug in a very simple tool I'm writing using the GNU BFD. It turns out that when opening a file for reading with BFD, various API calls are not valid until bfd_check_format_matches has been called and has returned successfully. This doesn't appear to be documented in the current texinfo manual. In fact, even the need to call bfd_init is not very clearly documented---but at least that one is fairly predictable. Most libraries, particularly those which maintain a lot of state behind the scenes, have an init function. Additional initialization steps, like the one I ran afoul of, are not at all guessable and should really be clearly signposted in the documentation. It was only by careful examination of what objcopy does that I could track down the bug.

It's well-known in the research world that interfaces come with protocol constraints, and it'd be nice to check client code against these just as we already do with function signatures. Initialization constraints are by far the most common example of protocol constraint. But rather than just checking satisfaction of these constraints, why should we even have to write the code at all? Why can't our toolchain insert initialization calls in the right places automatically? In fact at least one research linkage tool (Knit) can already do this.

How far can we generalise this constraint-solving? Usually protocol specifications use a finite-state model of component state, which are clearly good for a wide range of protocols but not all (consider a stack). Of course, we want our checking/solving to be decidable, and more complex models quickly lose this property, although I'm not sure whether a pushdown automaton model would necessarily be undecidable. More practically, Knit essentially supports a single bit of state (initialized or not) for each component (or actually a tri-state, because it supports finalizers too). If we wanted to generalise this even to a general finite-state model, we'd get ambiguity: say we had to call function A one or more times before calling function B, how many calls would we insert? Maybe “the smallest string” would be a sensible policy, but that would be ambiguous in some cases, and it's not clear it's always a good choice anyway. In protocol adaptation, there is invariably some programmer-provided “specification” to answer these questions, usually by constraining the generated automaton.

The BFD example is unusual in another way, in that it requires a successful call to the init function, not just any call. So our protocol checker/solver has to understand return values (and of course arguments) as well as function names. It's about time I re-examined the research literature on protocol adaptation and worked out how much of it has actually been turned into practical tools that address these issues. In due course I'm hoping to build a practical subset (or superset) of protocol adaptation functionality into Cake. One thing at a time though... first I must master that pesky BFD.

[/devel] permanent link contact


Powered by blosxom

validate this page