Our recent USENIX Sec’16 paper on x86/x64 disassembly has been getting a fair amount of attention on twitter and reddit, which is great to see! I’ve also talked to a few people who had some interesting additional insights which are not included in the paper. I thought it might be interesting to share them here.
Inline jump tables on ARM
One of our main findings is that on x86/x64, both gcc
v5.1 and clang
v3.6 are extremely well-behaved when it comes to jump tables. Rather than placing these inline in the .text
section, both compilers place jump tables in .rodata
. They emit no inline data at all, which means that linear disassembly produces 100% correct results.
It seems things are not quite so convenient on ARM. Apparently, arm-linux-gcc
does produce inline jump tables (just like Visual Studio does for x86/x64). A quick check confirms this, as illustrated in the listing below, which shows a snippet of objdump
output for lighttpd
cross-compiled with arm-linux-gcc
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
/home/dennis/servers/lighttpd-1.4.39/src/server.c:95 14f1c: e51b3008 ldr r3, [fp, #-8] 14f20: e2433001 sub r3, r3, #1 14f24: e3530010 cmp r3, #16 14f28: 979ff103 ldrls pc, [pc, r3, lsl #2] 14f2c: ea000046 b 1504c <sigaction_handler+0x15c> 14f30: 00014ffc .word 0x00014ffc 14f34: 00014fa0 .word 0x00014fa0 14f38: 0001504c .word 0x0001504c 14f3c: 0001504c .word 0x0001504c 14f40: 0001504c .word 0x0001504c 14f44: 0001504c .word 0x0001504c 14f48: 0001504c .word 0x0001504c 14f4c: 0001504c .word 0x0001504c 14f50: 0001504c .word 0x0001504c 14f54: 0001504c .word 0x0001504c 14f58: 0001504c .word 0x0001504c 14f5c: 0001504c .word 0x0001504c 14f60: 0001504c .word 0x0001504c 14f64: 00014fec .word 0x00014fec 14f68: 00014f74 .word 0x00014f74 14f6c: 0001504c .word 0x0001504c 14f70: 00015048 .word 0x00015048 /home/dennis/servers/lighttpd-1.4.39/src/server.c:97 14f74: e59f30e0 ldr r3, [pc, #224] ; 1505c <sigaction_handler+0x16c> 14f78: e3a02001 mov r2, #1 14f7c: e5832000 str r2, [r3] /home/dennis/servers/lighttpd-1.4.39/src/server.c:98 14f80: e59f20d8 ldr r2, [pc, #216] ; 15060 <sigaction_handler+0x170> 14f84: e51b300c ldr r3, [fp, #-12] 14f88: e1a00002 mov r0, r2 14f8c: e1a01003 mov r1, r3 14f90: e3a03080 mov r3, #128 ; 0x80 14f94: e1a02003 mov r2, r3 14f98: ebfffe0a bl 147c8 <memcpy@plt> /home/dennis/servers/lighttpd-1.4.39/src/server.c:99 14f9c: ea00002a b 1504c <sigaction_handler+0x15c> /home/dennis/servers/lighttpd-1.4.39/src/server.c:101 14fa0: e59f30bc ldr r3, [pc, #188] ; 15064 <sigaction_handler+0x174> 14fa4: e5933000 ldr r3, [r3] 14fa8: e3530000 cmp r3, #0 14fac: 0a000003 beq 14fc0 <sigaction_handler+0xd0> |
Indeed, we do see some inline data (the .word
lines), and it looks like a jump table. You can see that it contains an array of valid addresses, presumably pointing to the case blocks of a switch. The DWARF information tells us the inline data is produced from somewhere near line 95 in server.c
, shown in the following listing.
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
static void sigaction_handler(int sig, siginfo_t *si, void *context) { static siginfo_t empty_siginfo; UNUSED(context); if (!si) si = &empty_siginfo; switch (sig) { case SIGTERM: srv_shutdown = 1; last_sigterm_info = *si; break; case SIGINT: if (graceful_shutdown) { srv_shutdown = 1; } else { graceful_shutdown = 1; } last_sigterm_info = *si; break; case SIGALRM: handle_sig_alarm = 1; break; case SIGHUP: /** * we send the SIGHUP to all procs in the process-group * this includes ourself * * make sure we only send it once and don't create a * infinite loop */ if (!forwarded_sig_hup) { handle_sig_hup = 1; last_sighup_info = *si; } else { forwarded_sig_hup = 0; } break; case SIGCHLD: break; } } |
Clearly, this is indeed a switch statement, and the inline data is a jump table containing the addresses of the case blocks. Fortunately, if the binary is not stripped, then objdump
can use symbols to differentiate the data from the code. This is the reason why objdump
, in our test, is able to accurately mark the data as .word
lines. However, if the binary is stripped, this is no longer possible, and the inline data will cause disassembly errors.
(Thanks to Ammar Ben Khadra from Uni Kaiserslautern for bringing this to my attention.)
Detecting functions using the .eh_frame section
We also show in our paper that function detection (i.e., accurately identifying the start address and size of each function in the binary) is currently the most problematic primitive for disassemblers. False positive and false negative rates in excess of 20% are not an uncommon sight, despite the fact that function detection is one of the most used and important primitives for virtually all areas of binary analysis.
It seems there is an interesting way to get around this problem, based on the .eh_frame
section. This section contains information needed for DWARF-based stack unwinding. It’s primarily used for C++ exception handling, but also for various other applications such as backtrace()
, and gcc
intrinsics such as __attribute__((__cleanup__(f)))
and __builtin_return_address(n)
(more information in this StackOverflow post). Due to its many uses, .eh_frame
is present by default not only in C++ binaries that use exception handling, but in all binaries produced by gcc, including plain C binaries.
The point of all this is that .eh_frame
contains function boundary information that identifies all functions, and can thus be used to circumvent the function detection problem entirely. Here’s what a dump of the section looks like for 470.lbm
(one of the SPEC CPU2006 benchmarks) compiled with gcc
v5.1 at optimization level O0 for x64.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
$ objdump --wide --section=.eh_frame -sz 470.lbm.O0 470.lbm.O0: file format elf64-x86-64 Contents of section .eh_frame: 407e48 14000000 00000000 017a5200 01781001 .........zR..x.. 407e58 1b0c0708 90010710 14000000 1c000000 ................ 407e68 188bffff 2a000000 00000000 00000000 ....*........... 407e78 14000000 00000000 017a5200 01781001 .........zR..x.. 407e88 1b0c0708 90010000 24000000 1c000000 ........$....... 407e98 b889ffff 30010000 000e1046 0e184a0f ....0......F..J. 407ea8 0b770880 003f1a3b 2a332422 00000000 .w...?.;*3$".... 407eb8 1c000000 44000000 b68bffff b6000000 ....D........... 407ec8 00410e10 8602430d 0602b10c 07080000 .A....C......... 407ed8 1c000000 64000000 4c8cffff 3f000000 ....d...L...?... 407ee8 00410e10 8602430d 067a0c07 08000000 .A....C..z...... 407ef8 1c000000 84000000 6b8cffff e3020000 ........k....... 407f08 00410e10 8602430d 0603de02 0c070800 .A....C......... 407f18 1c000000 a4000000 2e8fffff 33000000 ............3... 407f28 00410e10 8602430d 066e0c07 08000000 .A....C..n...... 407f38 1c000000 c4000000 418fffff ee000000 ........A....... 407f48 00410e10 8602430d 0602e90c 07080000 .A....C......... 407f58 1c000000 e4000000 0f90ffff 40010000 ............@... 407f68 00410e10 8602430d 06033b01 0c070800 .A....C...;..... 407f78 1c000000 04010000 2f91ffff 5d010000 ......../...]... 407f88 00410e10 8602430d 06035801 0c070800 .A....C...X..... 407f98 1c000000 24010000 6c92ffff d0160000 ....$...l....... 407fa8 00410e10 8602430d 0603cb16 0c070800 .A....C......... 407fb8 1c000000 44010000 1ca9ffff be260000 ....D........&.. 407fc8 00410e10 8602430d 0603b926 0c070800 .A....C....&.... 407fd8 1c000000 64010000 bacfffff d4070000 ....d........... 407fe8 00410e10 8602430d 0603cf07 0c070800 .A....C......... 407ff8 1c000000 84010000 6ed7ffff c1000000 ........n....... 408008 00410e10 8602430d 0602bc0c 07080000 .A....C......... 408018 1c000000 a4010000 0fd8ffff c5000000 ................ 408028 00410e10 8602430d 0602c00c 07080000 .A....C......... 408038 1c000000 c4010000 b4d8ffff a20c0000 ................ 408048 00410e10 8602430d 06039d0c 0c070800 .A....C......... 408058 1c000000 e4010000 36e5ffff 4c0d0000 ........6...L... 408068 00410e10 8602430d 0603470d 0c070800 .A....C...G..... 408078 1c000000 04020000 62f2ffff f3000000 ........b....... 408088 00410e10 8602430d 0602ee0c 07080000 .A....C......... 408098 1c000000 24020000 35f3ffff f5010000 ....$...5....... 4080a8 00410e10 8602430d 0603f001 0c070800 .A....C......... 4080b8 1c000000 44020000 0af5ffff e0010000 ....D........... 4080c8 00410e10 8602430d 0603db01 0c070800 .A....C......... 4080d8 1c000000 64020000 caf6ffff db000000 ....d........... 4080e8 00410e10 8602430d 0602d60c 07080000 .A....C......... 4080f8 1c000000 84020000 85f7ffff 88000000 ................ 408108 00410e10 8602430d 0602830c 07080000 .A....C......... 408118 44000000 a4020000 f0f7ffff 65000000 D...........e... 408128 00420e10 8f02420e 188e0345 0e208d04 .B....B....E. .. 408138 420e288c 05480e30 8606480e 3883074f B.(..H.0..H.8..O 408148 0e40700e 38410e30 410e2842 0e20420e .@p.8A.0A.(B. B. 408158 18420e10 420e0800 14000000 ec020000 .B..B........... 408168 18f8ffff 02000000 00000000 00000000 ................ 408178 14000000 04030000 10f8ffff 10000000 ................ 408188 00000000 00000000 00000000 ............ |
As far as I know, this method was first described here by Ryan O’Neill (a.k.a. ElfMaster). He also provides code to parse the .eh_frame
section into a set of function addresses and sizes.
Note that the strip
command will not strip the .eh_frame
section. If you want to get rid of it (for anti-reversing or binary size reasons), you need to prevent it from being generated in the first place by passing -fno-asynchronous-unwind-tables
to gcc
.
(Thanks to Mariano Graziano from Cisco for telling me about this.)