Inline Jump Tables on ARM & Function Detection Using The .eh_frame Section

Our recent USENIX Sec’16 paper on x86/x64 disassembly has been getting a fair amount of attention on twitter and reddit, which is great to see! I’ve also talked to a few people who had some interesting additional insights which are not included in the paper. I thought it might be interesting to share them here.

Inline jump tables on ARM

One of our main findings is that on x86/x64, both gcc v5.1 and clang v3.6 are extremely well-behaved when it comes to jump tables. Rather than placing these inline in the .text section, both compilers place jump tables in .rodata. They emit no inline data at all, which means that linear disassembly produces 100% correct results.

It seems things are not quite so convenient on ARM. Apparently, arm-linux-gcc does produce inline jump tables (just like Visual Studio does for x86/x64). A quick check confirms this, as illustrated in the listing below, which shows a snippet of objdump output for lighttpd cross-compiled with arm-linux-gcc.

Indeed, we do see some inline data (the .word lines), and it looks like a jump table. You can see that it contains an array of valid addresses, presumably pointing to the case blocks of a switch. The DWARF information tells us the inline data is produced from somewhere near line 95 in server.c, shown in the following listing.

Clearly, this is indeed a switch statement, and the inline data is a jump table containing the addresses of the case blocks. Fortunately, if the binary is not stripped, then objdump can use symbols to differentiate the data from the code. This is the reason why objdump, in our test, is able to accurately mark the data as .word lines. However, if the binary is stripped, this is no longer possible, and the inline data will cause disassembly errors.

(Thanks to Ammar Ben Khadra from Uni Kaiserslautern for bringing this to my attention.)

Detecting functions using the .eh_frame section

We also show in our paper that function detection (i.e., accurately identifying the start address and size of each function in the binary) is currently the most problematic primitive for disassemblers. False positive and false negative rates in excess of 20% are not an uncommon sight, despite the fact that function detection is one of the most used and important primitives for virtually all areas of binary analysis.

It seems there is an interesting way to get around this problem, based on the .eh_frame section. This section contains information needed for DWARF-based stack unwinding. It’s primarily used for C++ exception handling, but also for various other applications such as backtrace(), and gcc intrinsics such as __attribute__((__cleanup__(f))) and __builtin_return_address(n) (more information in this StackOverflow post). Due to its many uses, .eh_frame is present by default not only in C++ binaries that use exception handling, but in all binaries produced by gcc, including plain C binaries.

The point of all this is that .eh_frame contains function boundary information that identifies all functions, and can thus be used to circumvent the function detection problem entirely. Here’s what a dump of the section looks like for 470.lbm (one of the SPEC CPU2006 benchmarks) compiled with gcc v5.1 at optimization level O0 for x64.

As far as I know, this method was first described here by Ryan O’Neill (a.k.a. ElfMaster). He also provides code to parse the .eh_frame section into a set of function addresses and sizes.

Note that the strip command will not strip the .eh_frame section. If you want to get rid of it (for anti-reversing or binary size reasons), you need to prevent it from being generated in the first place by passing -fno-asynchronous-unwind-tables to gcc.

(Thanks to Mariano Graziano from Cisco for telling me about this.)

Measuring Disassembly

We recently published a paper which is devoted entirely to exploring several aspects of x86/x64 disassembly. Among other things, we measured the prevalence of complex corner cases generated by modern compilers, and the precision with which disassemblers handle these cases. We released our complete data set, in part because there are too many results to fit in the paper, and also to allow others to compare their own results to ours.

Since we’ve received several questions asking for details on how to implement such a comparison, the below provides an example. Assuming that you’ve already downloaded our data set and generated the ground truth (as detailed in ~/disasm/README in the provided VM), getting results for a new disassembler requires two steps.

  1. Write a script that parses the output of the disassembler you want to evaluate, and puts it into a format useful for further processing.
  2. Compare the disassembler output to the ground truth, using another script for the specific primitive you want to evaluate.

We give examples of both steps. Though at first it may look like lots of work to fit these scripts to your own evaluation requirements, this should actually be quite straightforward, since you can reuse much of the code verbatim regardless of the specific test setup.

Parsing disassembler output

Since every disassembler is different, we need to make a specifically tailored script that parses the output of the disassembler we want to test, and puts it into a normalized format that we can process further. To keep things simple, the example presented here is based on objdump, but to create a script for another disassembler you can use the exact same basic idea. Without further ado, here is the bash script we used in our paper to parse the instructions output by objdump for our SPEC CPU2006 test suite (the scripts for our other tests are nearly identical).

Lines 3-4 are simply lists of all the SPEC CPU2006 C and C++ test cases, which we later iterate over to disassemble each test. On lines 40-66, we call the main disassembly function (described next) with various parameters, for each of the compiler configurations we test.

The important bit is the disasm function declared on line 8. It starts by reading its parameters into named variables and making the directories where we will output our results. Then, on line 19, we begin a loop over all test cases for the given configuration.

For each test case, we loop over all the optimization levels (line 23), and determine the name of the binary for the current test case/optimization level, skipping an iteration and yielding a warning if the file does not exist (lines 25-33). Note that we assume a particular format for the directory and binary names. For instance, we assume that all the stripped C++ test binaries as compiled with gcc 5.1/64-bit are located in a directory called truth/gcc510-64/bin/stripped/C++, and that binaries generated with Visual Studio have the .exe extension. If you are using the ground truth provided by us, these requirements are all met.

So far, the entire script has been disassembler-agnostic; you can reuse those parts for any disassembler you want to test. Lines 34-35 are the only lines that need to be tailored to the specific disassembler that is being tested. These are the lines where the actual disassembler is run, and its output parsed and dumped to file. Moreover, both these lines are identical except that line 34 disassembles a binary with symbols, while line 35 disassembles a stripped binary. For our example, in both cases we simply run objdump, grep for all the disassembled addresses, give each address a 0x prefix, and write the results to an output file for the specific test case/configuration. We store instruction addresses instead of mnemonics because the addresses are much easier to compare to our ground truth (as discussed below).

As you can see, the script generalizes to other disassemblers in a very straightforward way. Some disassemblers, such as IDA Pro, have a more complicated user interface that we cannot just parse with grep. In such cases, we require that the disassembler is scriptable, and can be run in an automated way. For instance, for IDA Pro we created a simple IDA Python script that dumps all the primitives we are interested in to file, and then ran the script in the above loop using IDA Pro’s “autonomous mode” (requiring no user interaction). In our objdump example, we save only instruction output, but for disassemblers which support other primitives, these can be parsed and written to file in an analogous way.

Comparing to the ground truth

So far, we have created a bash script which uses our chosen disassembler (objdump) to disassemble all our test cases and save the instruction addresses to file. Now, we want to compare these addresses to the ground truth provided in our data set. For this, we use a Python script (called that takes as input the ground truth file for a single test case (one of the * files provided in our data set), and a disassembler output file as generated by the disassembler-specific bash script described above.

For instance, here is a result you might get when calling this script from the command line for a particular test case (the files in ins/ are generated by the disassembly script we created above).

The script compares instruction addresses (as found by the disassembler) to the ground truth. To create scripts for other primitives, please refer to the README file provided in our data set. It completely describes our ground truth format, which is designed to be easily parseable by both humans and machines. The README file also describes the output format of our comparison scripts.

Let’s take a look at the main function, at line 93. It consists of three phases.

  1. Read the instruction-level ground truth into the bounds dictionary (lines 98-113), using instruction addresses as key, and mapping them to a descriptor of the instruction type (as described in the ground truth format section in the README file).
  2. Load all the instruction addresses found by the disassembler into the ins dictionary (lines 118-127).
  3. Compare the ground truth (bounds) to the disassembled instructions (ins), counting true positives, false positives and false negatives and then printing out the statistics (lines 129-160).

The certain_code and certain_data functions are used to parse a ground truth instruction descriptor, and find out if a particular address is code or data. To this end, both of these functions rely on insmap_byte, which is just a utility function that returns the type of a particular byte in the descriptor. (Each descriptor describes a single instruction, which may consist of multiple bytes.)

As an example of how to evaluate a primitive other than instructions, suppose that we instead want to measure the correctness of function information. In that case, you would fill the bounds dictionary in a similar way, but this time loading the function-level ground truth instead of the instruction-level ground truth. This simply means that instead of loading the lines that start with an '@' symbol (instruction descriptors), you would load the lines that start with 'F ' (an F followed by a space), and then compare the ground truth addresses to those found by the disassembler (in this case you won’t even need the certain_code/certain_data functions, but can just compare addresses directly). To get an intuitive feeling of how to parse for each kind of primitive, it is a good idea to open up one of the * files and skim/grep through it.

Now that we can compare ground truth and disassembler output for one test case at a time, it would be convenient to automate the process of doing this for all test cases. For this, we use one last bash script, which is similar in structure to the script used for disassembly.

In essence, the output files created by this script combine the outputs of for all test cases given a particular compiler/architecture configuration, one test case per line. As before, we have a loop over all test cases and optimization levels. This time, we have an additional loop at line 38, which goes over an array containing all disassemblers we want to evaluate. This way, we don’t have to manually run the comparison script for each disassembler. Note that the disassembler names, as specified in the array, need to match those used in the output file names generated by our disassembly script.

The script first resets all output files (lines 24-29), and then begins its main loop. The main loop simply calls for each possible configuration, and saves the statistics to file, printing warnings for any test cases or ground truth files which cannot be found. After the script completes, you will find a collection of combined statistics files in the ins directory, with one file per combination of compiler/architecture/language/disassembler. The file contents should look something like this (truncated for brevity).