Rambles around computer science

Diverting trains of thought, wasting precious time

Thu, 23 Jul 2020

Building a simple toolchain extension, the subversive way

I recently ran into a nasty problem, arguably a bug, in GCC. I decided to write a little tool to help me detect when I'm triggering this bug. This post is about how I did that, using some simple but handy helper scripts I've been growing for a while.

The bug is to do with “constructor” functions in GNU C (not to be confused with C++ constructors!). I'll quote from my bug report, slightly edited.

I was surprised to find that my __attribute__((constructor(101))) on a function was
not giving it priority over another constructor whose attribute used a higher number.
It turns out this was owing to an additional prototype that did not mention the
function as being a constructor.

This caused a really subtle initialization bug in my code, and was tough to track
down. At the least, I would expect a warning when the information is discarded.

$ cat test.c
// comment these for correct behaviour, i.e. 1 then 2
static void do_init1(void);
static void do_init2(void);

static void do_init2(void) __attribute__((constructor(102)));
static void do_init2(void)
        printf("Init 2!\n");

static void do_init1(void) __attribute__((constructor(101)));
static void do_init1(void)
        printf("Init 1!\n");

int main(void)
        return 0;

$ cc -save-temps    test.c   -o test
$ ./test
Init 2!
Init 1!
$ grep -B2 init_array test.s
        .size   do_init2, .-do_init2
        .section        .init_array,"aw"
        .size   do_init1, .-do_init1
        .section        .init_array

The attribute's priority argument is being discarded. It should be visible in the
section name (".init_array.0101" etc.). If the extra prototypes are removed, the
programmer-intended behaviour is restored.

I can see how this would happen. Normally, constructor functions are not called explicitly, so don't need to be prototyped. In my liballocs project, however, sometimes my code can be called during very early initialization of the program. This is mostly because it preloads a wrapper for malloc(), which is called by the dynamic linker before it even gets around to running constructors. To deal with these incoming calls at such an early time, the library can be initialized on demand, earlier than the usual initialization sequence. To allow this, several constructors are also protoyped functions that can be called explicitly across module boundaries.

Although I could do a one-time pass over my code to fix the problem, this wouldn't prevent the same problem recurring in the future. Getting a diagnostic into GCC would be a long, slow process, and wouldn't help if I switched to another compiler that take some other questionable interpretation of multiply prototyped constructors. So instead, I thought I'd write a simple and fairly standalone tool that can detect the problem during build. In short, I want to check that if a priority is given in a prototype, then that priority is reflected in the assembly code; if not, an error is raised.

My toolsub repository, although in somewhat rough-and-ready shape, provides a collection of scripts that let us interfere with the compilation and linking toolchain in useful ways. Most of the tools it provides take mechanisms provided by existing interfaces, such as the GCC driver's -wrapper option for supplying a wrapper script, but make them easier to use. It's tricky to use -wrapper because you have to parse various command lines, and some of these are compiler-internal commands like cc1 rather than the public cc, cpp and so on.

Firstly I want to do a pass over the preprocesssed C code, outputting a summary of any constructors that are declared with a priority. This priority information is what the compiler is discarding; I want to catch when this happens. To fish out these priorities, I wrote a short OCaml program using CIL. This post isn't about CIL, but you can see that tool here; it's about 80 lines of commented OCaml. (To build it, you'll need to use my branch of CIL, because it uses one API feature that I haven't yet contributed back to mainline.)

This tool calculates what I'll call a “generous” summary of the constructor priorities declared in the file—if you gave a priority on any prototype, it will always include a priority in its output. (Currently it doesn't help much with the case where multiple distinct priorities are used. There is also a bug about that.) What GCC does is the “mean” version: unless you're consistent, it will silently drop the priority; this is the problem we want to catch.

We could just report an error if we see inconsistent prototypes, which is the case that provokes the bad behaviour in GCC. However, I wanted to be a bit more “ground truth” than that: can we test what the compiler's output actually says about the constructor's priority? To do that, we need to know how the constructor priority mechanism works, and indeed about how the constructor mechanism works at all. It's roughly as follows.

For each constructor, the compiler generates assembly code to output a pointer to that function in the .init_array section. At run time, initialization code will walk this section and call each pointed-to function in order (this is done by the dynamic linker, if you're linking dynamically). To give constructors a priority, the compiler tweaks the name of this section so that it is something like .init_array.0123, for a constructor with priority 123. It is then the linker's job to sort these into order. The linker script rule that generates the final .init_array section is something like this.

  .init_array     :  { *(SORT_BY_NAME(.init_array.*)) *(.init_array) }

The real script is actually a bit messier than that, but the point is that for init-array sections named with a suffix, we link them in sorted order of their name, so lower-numbered sections will go first. Section named without a suffix come last.

To check that the compiler didn't discard the constructor priority, we need to find out what section name is used in the assembly code. That's pretty easy if we can run our tool immediately after the compilation step proper, i.e. the step that inputs preprocessed source and outputs assembly code. Doing that is pretty easy with the wrapper feature of toolsub. The following shell script fragment does the work; dumpconstr is the OCaml tool I mentioned, which gives the “generous” summary.

# Our check works by observing a single cc1 invocation...
# already-preprocessed source goes in, and assembly comes out.
# We cross-check each. We pass the cc1 preprocessed input
# to a CIL tool that dumps the with-priority constructors.
# Then we check that the assembly output contains the expected
# incantations in each case. If it doesn't, we flag an warning.
my_cc1 () {
    # drop the literal "cc1" argument
    # run the compiler
    # Now do our check... our caller has told us what infiles and outfile are
    for f in "${infiles[@]}"; do
        case "$f" in
                # use the CIL tool to find out what priorities are at source level
                constrs="$( "$(dirname "$0")"/dumpconstr "$f" )"
                # for each one, check the assembly agrees
                while read func constr prio; do
                    if [[ -n "$prio" ]]; then
                        # the symbol should be referenced in a section
                        # named ".init_array.0*%04d"
                        # where %04d is the priority, zero-padded
                        section="$( sed 's/^[[:blank:]]*\.section[[:blank:]]*/\f/g' "$outfile" | \
                            tr '\n' '\v' | tr '\f' '\n' | \
                            sed -rn "/$regex/ p" )"
                        actual_prio="$( echo "$section" | sed -nr "/${regex}.*/ {s//\2/;p}" )"
                        if ! [ ${actual_prio:-0} -eq $prio ]; then
                            echo "Error: inconsistent use of constructor attributes on function $func" 1>&2
                            exit 1 
                done <<<"$constrs"
                # we process at most one input file
            (*) # skip files that don't look like preprocessed C/C++

# delegate to the generic wrapper -- we've set WRAPPER so it won't re-source the funcs
. ${TOOLSUB}/wrapper/bin/wrapper

It works by defining a shell function, here called my_cc1 which is declared to the main wrapper logic via the CC1WRAP variable. The main wrapper calls this whenever the compiler wants to compile one file to assembly code, after preprocessing. The shell function's arguments are literally the command line that the compiler wants to run, and we run exactly that command. But immediately after that, we also run our OCaml tool on the input file (preprocessed C), then scrape the relevant section names from the output file (assembly code), and check the priorities match.

The tool works! I can drop it into the Makefile of the project I'm using it in, by adding

CFLAGS += -no-integrated-cpp -wrapper /path/to/wrapper

and hey presto, my compilation job complains that my priorities are getting dropped, forcing me to fix the bug. I'm providing the compiler a wrapper script, but most of the wrapper logic is generic stuff provided by toolsub; I only had to write the shell function shown above, and the OCaml program I mentioned, to implement my tool.

So, toolsub's core wrapper logic is pretty useful, but there are at least a couple of things I'd like to fix with it. One is that wrappers should not be dealing in terms of the cc1 command, which is a compiler-internal thing. It'd be better if the core wrapper could translate such commands into “documented” interfaces like the compiler's top-level cc command line. It already does this for preprocessing, translating from a cc1 to a cpp command line (er, mostly). For the assembler it's also handled, trivially since the driver runs this directly. It's not yet done for linking, nor for the compiler proper... the shell function above still gets given a cc1 command, albeit assisted by some other logic that has pre-extracted the input and output filenames. That logic itself remains a bit rough.

The other thing I could fix is that arguably, this wrapper logic should not be using shell scripts. They're neither fast nor pleasant. But actually... it's hard to beat them for simple things... I hate how clunky subprocesses quickly become in Python, for example. In the above example, we call out to an OCaml tool to do the analysis of C source. Scraping the assembler file using regex hackery doesn't seem so bad, since the assembler is line-oriented.

My goal with toolsub is to have it become an umbrella project for small utilities, like the wrapper being used here, that help people modify their toolchain's behaviour in ways that are easy to “drop in” to existing builds. Think “prototyping and personal use”, not production; research prototypes are my particular motivation. The scripts are there to help avoid getting bogged down in compiler-specific things, such as compiler internals or details of the command line. Another of the tools in there is cilpp which again uses CIL, this time packaged to enable CIL passes to be invoked as “preprocessor plugins” (using something like -Wp,-fplugin,/path/to/cil/pass.cmxs). Injecting these as arguments is easier than using an alternative driver like cilly.

A final tool I want to mention is cccppp, unfortunately just a proof-of-concept for now, written in the very fun five days I spent at the Recurse Center in April last year. It was motivated by the lack of anything comparable to CIL for C++. I want to prototype tools that instrument C++, but I don't want my prototypes to get caught up in the fast churn of LLVM and Clang internals. So the idea is to provide a simple, fairly stable tool that can “virtualize” C++ code by lifting all the built-in operators, like pointer dereference or array subscripting, into templates that the tool invoker controls. By default, these templates just do the normal thing. But, for example, to instrument array accesses, the invoker can supply a custom definition for the built-in array access operator (as a template specialization); I'm using this to provide bounds checking. The key idea is to work using only source-level rewrites of the C++ code... a relatively stable feature of the Clang tooling API. And it's a source-to-source pass, so as long as the tool can parse your code, you can run any compiler on what comes out. Expressing these source-to-source rewrites using Clang's API is, however, pretty nasty. It still needs more work before it can work on large programs; I'm hoping it won't need loads more.

What's the lesson here? One could still argue that I should just get this problem fixed in the compilers I care about, rather than writing hacky wrapper scripts. But I tend to think the opposite: during development, every systems codebase should be compiled via a hacky wrapper script! It's a good way to add custom sanity checks for properties that your project cares about, even if most others don't. It suits properties that are quick to check, are reasonably syntactic or local to specific code (at least to specific files or functions), justify a “fail fast” error report (e.g. because it's critical to correctness), and therefore are better expressed as a static check than in a test case. Link-time invariants are another thing that liballocs could use some checking of—though sadly I haven't implemented wrapper support for the linker yet. Some other properties, like naming conventions or checking for ABI stability, arguably fit this category, depending on how strict you want to be. Anyway, we'll see if I end up using this facility more often.

After subverting your toolchain, you may also want to subvert your runtime. I'll write more about that in future posts.

[/devel] permanent link contact

Powered by blosxom

validate this page