Rambles around computer science

Diverting trains of thought, wasting precious time

Mon, 23 Nov 2009

Back in the USSR

My work aims to show that object code is more tractable than you think---tractable enough that a reasonable number of programming tasks could be abstracted from this level rather than returning to source level. In particular, rewiring references in object files is a powerful technique for interposing new logic to adapt and compose existing code in interesting ways. I recently discovered this excellent talk by Kevin Pulo (slides linked from this page) motivating some very practical uses of LD_PRELOAD.

One weakness of LD_PRELOAD is that libraries' internal references are out-of-bounds---so, for example, if your C library implements execv and friends as wrappers around execve(), and you want to change the behaviour of the whole family of exec functions, it's probably not enough to wrap execve(), because references to execve() are internal to the .text section in libc.so, so are not subject to dynamic linking. Here's the objdump -Rd to prove it.

009bb2e0 <execv>:
  9bb2e0:       53                      push   %ebx
  9bb2e1:       e8 ea 2e f8 ff          call   93e1d0 <__i686.get_pc_thunk.bx>
  9bb2e6:       81 c3 0e 1d 0c 00       add    $0xc1d0e,%ebx
  9bb2ec:       83 ec 0c                sub    $0xc,%esp
  9bb2ef:       8b 83 9c ff ff ff       mov    0xffffff9c(%ebx),%eax
  9bb2f5:       8b 00                   mov    (%eax),%eax
  9bb2f7:       89 44 24 08             mov    %eax,0x8(%esp)
  9bb2fb:       8b 44 24 18             mov    0x18(%esp),%eax
  9bb2ff:       89 44 24 04             mov    %eax,0x4(%esp)
  9bb303:       8b 44 24 14             mov    0x14(%esp),%eax
  9bb307:       89 04 24                mov    %eax,(%esp)
  9bb30a:       e8 71 fe ff ff          call   9bb180 <execve>
  9bb30f:       83 c4 0c                add    $0xc,%esp
  9bb312:       5b                      pop    %ebx
  9bb313:       c3                      ret    

Notice that the call to execve has no relocation record---it's already bound.

What if we want to interpose on these pre-bound references? It's quite easy to identify these references, as the disassembler has done above, when they're in code. But suppose we want to interpose on some data--code or data--data references? Say we want to add a new element to a statically-defined pointer structure (maybe a linked list or something). Given some bytes in a data section, it's intractable to deduce what the interpretation of those bytes will be at run time (in contrast to code sections), so we can't work out which ones are pointers that we might want to interpose on and which are just some bytes that happen to look like an address.

But wait! Actually we can do it. The reference shown above can only be bound before link-time because PC-relative branches are available... which is only in text sections. In data sections, even internal references need relocation records, because pointers are always represented as absolute addresses. This means that it's only code--code references that we need to unpick without the aid of relocation records---and we can do this by disassembling the instructions.

In my existing work I have some extensions modifications to GNU binutils which can put an object file into what I call “unbound static single reference form” (although I'm working on a name that contracts to something less embarrassing)---a form where every reference in the file (internal or external) has its own relocation record with a unique symbol. This is neat because these symbols can then be bound independently using a simple linker invocation (ld -r --defsym name=value), allowing each reference in the file's linkage to be independently controlled. The discovery that the same can probably be done for shared libraries is heartening. Watch this space, or contact me to find out more.

[/research] permanent link contact


Powered by blosxom

validate this page