MindshaRE: Walking the Windows Kernel with IDA Python
May 22, 2018 | Jasiel SpelmanMindShaRE is our periodic look at various reverse engineering tips and tricks. The goal is to keep things small and discuss some everyday aspects of reversing. You can view previous entries in this series here.
When I attend security conferences, I enjoy talking to people about how they augment their own reverse engineering efforts. It is always beneficial to find out how others automate tedious tasks. One thing that often surprises me is that many people using IDA don't use the included APIs to augment their efforts. To try and change that, I'm going to start sharing some of my code and demonstrate some of the things you can accomplish with IDA and Python.
As an introduction to IDA Python, I'm going to show how you can enumerate the Windows System Call tables.
For those that don't know, all system calls on Windows are given an ID. This ID is a unique value that is used to specify the function you would like to call when performing a system call. These IDs can vary heavily across different versions of Windows and especially across service packs. As of Windows 10, they can vary across release branches. For normal applications this isn't a big deal as the userland libraries will always match to use the appropriate ID for the system you're on.
If you're analyzing an exploit or if you're attempting to directly make system calls yourself, this may not be the case. As a consequence, it is handy to know which IDs map to which functions for a given OS version. For a long time, referencing one of the tables that Mateusz Jurczyk hosts on his site was the easiest way, but if you're wanting a version not present there, you'll need to know how to do it yourself.
I'll quickly explain how to enumerate the tables manually, then we'll go over automatically handling it with Python.
Manually Enumerating Windows System Call Tables
There are three important symbols for parsing the system call tables: the base of the table, the size of the table, and the number of bytes the arguments take on the stack. For ntoskrnl.exe
, the names of these symbols are KiServiceTable
, KiServiceLimit
, and KiArgumentTable
respectively. For win32k.sys
, the names of these symbols are W32pServiceTable
, W32pServiceLimit
, and W32pArgumentTable
. On 32-bit builds, these symbol names are prepended with an underscore.
As an example, let's look at Windows 7 64-bit. This is from ntoskrnl.exe
version 6.1.7601.24117.
Based on this, we can see that there are 401 (0x191) system calls.
If we look at the table in Figure 2, we can manually map the functions to their IDs. Based on what we see above, NtMapUserPhysicalPagesScatter
has an ID of 0x0000, NtWaitForSingleObject
is 0x0001, NtCallbackReturn
is 0x0002, and so forth.
There are two special cases we need to handle. If we are looking at win32k.sys
, the ID will be the index of the function within the table plus 0x1000. Also, on 64-bit builds for Windows 10 as of Windows build 1607 need to be handled differently. In these builds, the system call table contains offsets to the functions as four-byte values rather than as eight-byte values.
This is from ntoskrnl.exe
version 10.0.17134.48:
Handling this just means that we need to read four bytes at a time and then add it to the base address.
Automating Mapping Within IDA
Let's first go over the IDA functions we will need to call:
- idaapi.get_imagebase
- This function will return the base address within the module we're looking at.
- idc.GetInputFile
- This function will return the name of the file the IDB was loaded for.
- idc.BADADDR
- This is a constant value that maps to -1 as an unsigned integer (it can also be used to test whether we're in 32-bit mode or 64-bit mode)
- idc.Name
- This function will return the name of a given address.
- idc.LocByName
- The inverse of idc.Name, this function will return the address of a given name.
- idc.Dword
- This function will return the four-byte value at a given address.
- idc.Qword
- This function will return the eight-byte value at a given address.
- idautils.DataRefsFrom
- This function will enumerate through any data references from a given address.
We'll start off by ensuring we are looking at either ntoskrnl.exe
or win32k.sys
:
We can then determine which symbol names we need to use. Next, we need to test to see if we need to use the underscore variants:
LocByName
will return BADADDR
if the name does not exist, so we can use it to test if the symbol name exists with or without the underscore.
Now that we have the correct symbol names to use, let's grab the actual size of the table:
First we get the address with LocByName
, then we grab the value at the address with Dword
.
Last corner case to handle, the Windows 10 64-bit case:
DataRefsFrom
will iterate through the data references at the base of the table. There should be one, unless we're looking at one of the newer versions of Windows 10. When looking at those newer Windows 10 builds, we'll just need to make sure we add the base address of the image, which we'll get with get_imagebase
.
At this point, all we need to do is read consecutive values starting from the table base. We can use Qword
for 64-bit versions (outside of newer builds of Windows 10) and Dword
for 32-bit versions.
Here's an example of what this can print out:
You can see a full copy of this code on our Github page here.
Conclusion
Reverse engineering software can be tedious at times, but automating tasks can take away some of that tedium. I hope you've enjoyed this blog post, look out for future blog posts on IDA and Python. Until then, you can find me on Twitter at @WanderingGlitch, and follow the team for the latest in exploit techniques and security patches.