Reverse Engineering Binary Kernel Drivers
Written Oct 8 - 26, 2008
Research Done Apr 27 - May 25, 2008
Linux Kernel drivers are very important this year and will continue to be in the coming years. Multiple kernel driver projects are underway and multiple methods are being used to develop them. As a software developer and hacker, I find that reverse engineering is one of the most important methods in writing kernel drivers for devices that currently lack open source drivers. Whether the method is snooping in on communication, brute forcing data, or analysis of driver state, reverse engineering tactics are employed. In this essay I will be reverse engineering a binary kernel driver, which is protected under copyright law as a fair use of copyrighted material. If you feel that I am violating your copyright during the production of this, please feel free to contact me and I will be glad to discuss this. Note however on the other hand that currently several Linux copyright holders consider binary blobs to be violations of their GPL copyright. These issues are connected and yet immaterial at this point. Let's just write the code.
I will be showing the user here how to write C code based on assembly (which is the binary). This is the basis of the reverse engineering technique I will be discussing.
First, remember that any binary has to actually execute on a system. No matter how obfuscated a piece of binary is, it should have a lot in common with output from the compiler. We can look at functions, calls, jumps, memory reads and writes in the same way we look at C code.
Let's think about what the code does. The HAL binary blob must be called and possibly call functions. These calls will pass data from one function to another. Thus, we're looking at this binary blob in the overall sense from the perspective of a black box that can be called with data and that in return calls us. It is also possible that it could make calls to hardware, but this isn't guaranteed.
Secondly, the person who is obsfucating this binary isn't really going to put a lot of work into hiding their code unless they're really well motivated. If not very motivated, they are likely going to do the least amount of work to cover their rears. They are going to make mistakes and we can definitely pull them out of their work.
In the case of Madwifi, it is quite obvious that the person has a script that modifies each function name using some sort hash or something which gives us a consistent function name in each revision of the binary. Even the 64-bit and 32-bit versions have the same binary names. Lucky us. It does not seem to me that the binaries are obsfucated in any other way, so we can expect no runaround or spaghetti code that isn't also in the C code, which is good news. Any reversing I do should give us pretty much what they have in their code.
The first function I reversed was ath_hal_probe. Before I reversed it, I knew that it had something to do with the vendor_id and product_id and that it returned something that was used for vital items in the driver. After reversing it I feel a bit silly doing all that work to find out that it simply checks whether the vendor_id and product_id are used by this driver and returns a number that tells the driver which chipset is used, which is used in various parts of the driver (usually just if statements). Thus my reversing it only helps us find code that we really want to reverse more than others and doesn't actually give us any useful information otherwise.
[SEE code listing 1 in reveng1-0.1.tgz, madwifi_hal1.c.]
Kernel drivers are unlinked binaries which are not very useful to reverse engineers. Linked binaries are useful because we can see where the calls and such go to. To link a kernel driver, you can simply run ld or gcc to make a shared library out of it. It will act and list all the items it needs quite handily for the next part of analysis.
gcc -o hal-i386.so -shared madwifi/hal-i386.o
Since my first foray wasn't very useful I decided to reverse the overview of the HAL binary using my fun tool, Small Wide World. It's designed to graph node networks and has an interface to objdump disassemblies, objmap. It graphs functions and calls so that reverse engineers, security engineers and programmers can easily look at the structure of a program. With this program (SEE output below) I was able to find what functions were critical to the functioning of the binary. I wrote a quick addition to this program to find the function with the most calls, zz0016d832 with 78 calls. In order to prove that this method works, I will reverse engineer as much of this function as I can in 1 hour at 1:40 am on Saturday night. I'll be verbose as I can in the comments, but feel free to ask if you're interested in a piece of this code.
python ../smallwideworld/objmap1.py hal-i386.dis
The second function that I decided to reverse zz0016d832 is 2323 lines of assembly long (starting on line 33,999 by coincidence) and calls itself 69 times. The first 53 calls to itself follow the very recognizable e8 fc ff ff ff which is simply calling -3 which is used for PIC (position independent code, used for libraries, shellcode, and such). The other calls are all to zz0016d732, which is called 48 times. It is never called explicitly from the start which tells us that it is called dynamically. Dynamic calls are quite common with advanced code like kernel drivers and object-oriented C code, but aren't a requirement of all code. After reversing the first 44 lines, I noticed that all code paths of the function return, which means that this one function has many functions inside it. By counting rets, I am able to find out how many functions are in this function: 18. That makes sense because when stripping symbols, many functions that are not called explicitly have their function names removed. I just finished 1 easy hour of reversing this function and I was able to reverse [92 lines of assembly]. This 280KB binary is 40387 lines long. Assuming that it would take the same amount of time as the first (it won't), it would take me 439 hours to reverse this entire binary. At the end, I would have C code that would compile exactly into the HAL binary. If someone were willing to pay me my normal hourly wage ($25) to reverse this binary, it would cost them $10975. At 40 hours per week it would take me 11 weeks (2.5 months) to reverse this binary.
DataReverse of zz0016d832():
00025846 <zz0016d832>: 25846: 53 push %ebx ;allocate 2 longs on the stack 25847: 83 ec 08 sub $0x8,%esp 2584a: 8b 5c 24 10 mov 0x10(%esp),%ebx ;magic number 0xe0 = 224 on the stack 2584e: c7 04 24 e0 00 00 00 movl $0xe0,(%esp) ;get the current position 25855: e8 fc ff ff ff call 25856 <zz0016d832+0x10> ;put eax onto this struct. What is eax? 2585a: 89 83 ec 5b 00 00 mov %eax,0x5bec(%ebx) ;if (eax != 0) 25860: 85 c0 test %eax,%eax ;jump to l1 25862: 75 14 jne 25878 <zz0016d832+0x32> ;else ;eax = piece on the stack 25864: 8b 44 24 14 mov 0x14(%esp),%eax ;*eax = 2 25868: c7 00 02 00 00 00 movl $0x2,(%eax) ;eax = 0 2586e: b8 00 00 00 00 mov $0x0,%eax ;jump to l2 25873: e9 90 00 00 00 jmp 25908 <zz0016d832+0xc2> ;end if l1: ;put 0x80 = 128 onto the struct pointed to 25878: c7 83 7c 73 00 00 80 movl $0x80,0x737c(%ebx) 2587f: 00 00 00 ;put 0x80 = 128 onto the stack 25882: c7 04 24 80 00 00 00 movl $0x80,(%esp) ;get the current position 25889: e8 fc ff ff ff call 2588a <zz0016d832+0x44> ;put eax onto the struct pointed to 2588e: 89 83 78 73 00 00 mov %eax,0x7378(%ebx) ;if (eax != 0) 25894: 85 c0 test %eax,%eax ;jump to l3 25896: 75 11 jne 258a9 <zz0016d832+0x63> ;else ;eax = piece on the stack 25898: 8b 44 24 14 mov 0x14(%esp),%eax ;*eax = 2 2589c: c7 00 02 00 00 00 movl $0x2,(%eax) ;eax = 0 258a2: b8 00 00 00 00 mov $0x0,%eax ;jump to l2 258a7: eb 5f jmp 25908 <zz0016d832+0xc2> ;end if l3: ;put 0x40 = 64 onto the struct pointed to (see similar call @l1) 258a9: c7 83 7c 73 00 00 40 movl $0x40,0x737c(%ebx) 258b0: 00 00 00 ;put 0x26d49 onto the struct pointed to 258b3: c7 83 a8 58 00 00 49 movl $0x26d49,0x58a8(%ebx) 258ba: 6d 02 00 ;put 0x26950 onto the struct pointed to 258bd: c7 83 ac 58 00 00 50 movl $0x26950,0x58ac(%ebx) 258c4: 69 02 00 ;put 0x25350 onto the struct pointed to 258c7: c7 83 b0 58 00 00 50 movl $0x25350,0x58b0(%ebx) 258ce: 53 02 00 ;put 0x26a28 onto the struct pointed to 258d1: c7 83 b4 58 00 00 28 movl $0x26a28,0x58b4(%ebx) 258d8: 6a 02 00 ;put 0x2613d onto the struct pointed to 258db: c7 83 b8 58 00 00 3d movl $0x2613d,0x58b8(%ebx) 258e2: 61 02 00 ;put 0x2590d onto the struct pointed to 258e5: c7 83 bc 58 00 00 0d movl $0x2590d,0x58bc(%ebx) 258ec: 59 02 00 ;put 0x2562b onto the struct pointed to 258ef: c7 83 c0 58 00 00 2b movl $0x2562b,0x58c0(%ebx) 258f6: 56 02 00 ;put 0 onto the struct pointed to 258f9: c7 83 c4 58 00 00 00 movl $0x0,0x58c4(%ebx) 25900: 00 00 00 ;eax = 1 25903: b8 01 00 00 00 mov $0x1,%eax l2: ;dellocate 2 longs on the stack 25908: 83 c4 08 add $0x8,%esp ;take the stored ebx off the stack 2590b: 5b pop %ebx ;return; 2590c: c3 ret ;Note: all code paths end here, so that's the end of the function. ;The purpose of the above function is pretty interesting ;A new function starts here it's pretty obvious by the pushes. ;function initialization 2590d: 55 push %ebp 2590e: 57 push %edi 2590f: 56 push %esi 25910: 53 push %ebx ;allocate 1036 bytes on the stack, owch 25911: 81 ec 0c 04 00 00 sub $0x40c,%esp ;eax = long in a struct on the stack 25917: 8b 84 24 20 04 00 00 mov 0x420(%esp),%eax ;put eax onto local variable list 2591e: 89 44 24 28 mov %eax,0x28(%esp) ;edx = extend word to long on struct at eax ;http://en.wikibooks.org/wiki/X86_Assembly/Data_Transfer#Move_and_Extend 25922: 0f b7 90 d4 01 00 00 movzwl 0x1d4(%eax),%edx ;eax = edx 25929: 89 d0 mov %edx,%eax ;eax &= 0xf0 2592b: 25 f0 00 00 00 and $0xf0,%eax ;eax -= 0x30 25930: 83 e8 30 sub $0x30,%eax ;if ((unsigned int)eax > 0x3f) ;http://en.wikibooks.org/wiki/X86_Assembly/Control_Flow#Jump_if_Greater 25933: 83 f8 3f cmp $0x3f,%eax ;jump to l4 25936: 77 24 ja 2595c <zz0016d832+0x116> ;if (dx == 0x35) 25938: 66 83 fa 35 cmp $0x35,%dx ;jump to l5 2593c: 74 14 je 25952 <zz0016d832+0x10c> ;if (dx == 0x45) 2593e: 66 83 fa 45 cmp $0x45,%dx ;jump to l5 25942: 74 0e je 25952 <zz0016d832+0x10c> ;if (dx == 0x46) 25944: 66 83 fa 46 cmp $0x46,%dx ;jump to l5 25948: 74 08 je 25952 <zz0016d832+0x10c> ;if (dx != 0x36) 2594a: 66 83 fa 36 cmp $0x36,%dx ;nop? 2594e: 66 90 xchg %ax,%ax ;jump to l4 25950: 75 0a jne 2595c <zz0016d832+0x116> l5: ;put 2 onto the stack (local var) 25952: c7 44 24 2c 02 00 00 movl $0x2,0x2c(%esp) 25959: 00 ;jump to l6 2595a: eb 08 jmp 25964 <zz0016d832+0x11e> l4: ;put 1 on the stack (local var) see same var value @l5 2595c: c7 44 24 2c 01 00 00 movl $0x1,0x2c(%esp) 25963: 00 l6: ;edx = local var on the stack 25964: 8b 94 24 2c 04 00 00 mov 0x42c(%esp),%edx ;eax = *edx 2596b: 8b 02 mov (%edx),%eax ;eax &= 0x701f0 2596d: 25 f0 01 07 00 and $0x701f0,%eax ;if (eax == 0xd0) 25972: 3d d0 00 00 00 cmp $0xd0,%eax ;jump to l7 25977: 74 73 je 259ec <zz0016d832+0x1a6> ;if ((unsigned int)eax > 0xd0) 25979: 3d d0 00 00 00 cmp $0xd0,%eax ;jump to l8 2597e: 77 18 ja 25998 <zz0016d832+0x152> ;if (eax == 0xa0) 25980: 3d a0 00 00 00 cmp $0xa0,%eax ;jump to l9 25985: 74 4a je 259d1 <zz0016d832+0x18b> ;if (eax != 0xc0) 25987: 3d c0 00 00 00 cmp $0xc0,%eax ;nop? 2598c: 8d 74 26 00 lea 0x0(%esi),%esi ;jump to l10 25990: 0f 85 47 07 00 00 jne 260dd <zz0016d832+0x897> ;jump to l7 25996: eb 54 jmp 259ec <zz0016d832+0x1a6> l8: ;if (eax == 0x150) 25998: 3d 50 01 00 00 cmp $0x150,%eax ;nop? 2599d: 8d 76 00 lea 0x0(%esi),%esi ;jump to l11 259a0: 74 14 je 259b6 <zz0016d832+0x170> ;if (eax == 0x800) 259a2: 3d 00 08 00 00 cmp $0x800,%eax ;jump to l11 259a7: 74 0d je 259b6 <zz0016d832+0x170> ;if (eax != 0x140) 259a9: 3d 40 01 00 00 cmp $0x140,%eax ;nop? 259ae: 66 90 xchg %ax,%ax ;jump to l10 259b0: 0f 85 27 07 00 00 jne 260dd <zz0016d832+0x897> l11: 259b6: 8b 4c 24 28 mov 0x28(%esp),%ecx 259ba: 81 c1 d4 4b 00 00 add $0x4bd4,%ecx 259c0: 89 4c 24 30 mov %ecx,0x30(%esp) 259c4: 8b 44 24 28 mov 0x28(%esp),%eax 259c8: 0f b7 90 d8 3e 00 00 movzwl 0x3ed8(%eax),%edx 259cf: eb 33 jmp 25a04 <zz0016d832+0x1be> l9: 259d1: 8b 54 24 28 mov 0x28(%esp),%edx 259d5: 81 c2 e0 4b 00 00 add $0x4be0,%edx 259db: 89 54 24 30 mov %edx,0x30(%esp) 259df: 8b 4c 24 28 mov 0x28(%esp),%ecx 259e3: 0f b7 91 da 3e 00 00 movzwl 0x3eda(%ecx),%edx 259ea: eb 18 jmp 25a04 <zz0016d832+0x1be> l7:
Using well-trained programmers from developing countries, hackers from various first-world countries, and proper management techniques, I could reverse engineer hundreds of very large binaries per month at the same rate. In my opinion $11,000 is a good price for a open source driver. Binary drivers such as nvidia and fglrx are much larger (2.3MB and 2.1MB respectively), but incredible amounts of work have been done already using different methods (driver state in the case of nouveau and register documentation in the case of RadeonHD), which means that most of the functions can be discarded and only specific items need to be reversed. In fact, most of the registers have been completely discovered by these projects using the methods described, so the actual useful output from further reverse engineering would be the manner in which data is sent to the device to produce the desired result.
However, considering the cost of this in time and money, it would actually be more productive to design tools such as Small Wide World and advanced assembly reversing techniques instead of hiring coders. That way any binary can be reversed for less time investment. From my experience with Assembly and C, I would expect that a partial reversing tool that would speed up this project to a 40 hour project (a factor of 10 time savings) would only require 200 hours to write (less than it would take to reverse the HAL binary).
This article has a few theses in disguise. The first thesis is that binary reverse engineering is not only possible but practical. The Nouveau driver project has been offered $10,000 (unasked) to make their efforts work well and have been given over a year to finish their driver. Their efforts have not been fruitless, but I expect that a pure binary reversal would actually have provided better results (fairly stable 2D and 3D from the outset). Also, reversing the Madwifi HAL would probably have been cheaper and faster than the current ath5k OpenHAL. I do not wish to diminish the efforts of these open source projects, I just hope that my efforts will be able to encourage researchers and programmers to work towards a standard reversing timeline that ensures most benefit of the open source community. The second thesis is that completely manual reversing of binaries is actually slower than writing a tool to reverse binaries when looking at large projects such as the Madwifi HAL. Another thesis I would like to add now is that binaries are thinly veiled forms of C code. A person well-versed in assembly can reverse a binary at incredible speed.
Lastly, I would like to take a moment to discuss copyright issues. I value my own copyrights as well as those of others. I consider copyright essential to business models that involve selling copiable materials to users. Since copyright owners cannot stop (by DRM or any other method) the copying of users, they can only make it more difficult. For example, the PS3 encrypts their discs pretty well and makes their platform quite secure. I suspect that hackers will find a way to pirate and decrypt PS3 discs and content in the future, but the difficulty is currently fairly high. As an owner of a PS3 the thought has not crossed my mind to pirate software because I respect the copyrights. On the other hand CDs are completely open and available to copying. I use fair use to copy every CD I own to my hard drive and I make multiple copies for my MP3 player, my PS3, laptop, and even mix CDs I give to friends. These are rights I reserve as an owner of copyrighted items. An owner of the copyright can sue anyone they like, but suing someone for fair use of copyrighted materials is not legitimate.
Chilling Effects has an excellent section on reverse engineering which explains everything that anyone would ever want to know about reverse engineering and copyright law. Simply put, reverse engineering is creating a non-derivitive product from a product by way of understanding the way it works. The method I have described is completely protected under copyright law. I must admit that I consider it rather ironic that the Madwifi project copyrights the HAL binary with the GPL which expressly allows use, modification, reverse engineering and so forth, which I am reverse engineering to create a GPL licensed C source version.
If you are interested in reversing a binary (kernel or otherwise), feel free to