Exploring Python Using GDB
People tend to have a narrow view of the problems they can solve using GDB. Many think that GDB is just for debugging segfaults or that it's only useful with C or C++ programs. In reality, GDB is an impressively general and powerful tool. When you know how to use it, you can debug just about anything, including Python, Ruby, and other dynamic languages. It's not just for inspection either—GDB can also be used to modify a program's behavior while it's running.
When we ran our Capture The Flag contest, a lot of people asked us about introductions to that kind of low-level work. GDB can be a great way to get started. In order to demonstrate some of GDB's flexibility, and show some of the steps involved in practical GDB work, we've put together a brief example of debugging Python with GDB.
Imagine you're building a web app in Django. The standard cycle for building one of these apps is to edit some code, hit an error, fix it, restart the server, and refresh in the browser. It's a little tedious. Wouldn't it be cool if you could hit the error, fix the code while the request is still pending, and then have the request complete successfully?
As it happens, the Seaside framework supports exactly this. Using one of Stripe's example projects, let's take a look at how we could pull it off in Python using GDB:
Pretty cool, right? Though a little contrived, this example demonstrates many helpful techniques for making effective real-world use of GDB. I'll walk through what we did in a little more detail, and explain some of the GDB tricks as we go.
For the sake of brevity, I'll show the commands I type, but elide some of the output they generate. I'm working on Ubuntu 12.04 with GDB 7.4. The manipulation should still work on other platforms, but you probably won't get automatic pretty-printing of Python types. You can generate them by hand by running p PyString_AsString(PyObject_Repr(obj))
in GDB.
Getting Set Up
First, let's start the monospace-django server with --noreload
so that Django's autoreloading doesn't get in the way of our GDB-based reloading. We'll also use the python2.7-dbg
interpreter, which will ensure that less of the program's state is optimized away.
As of version 7.0 of GDB, it's possible to automatically script GDB's behavior, and even register your own code to pretty-print C types. Python comes with its own hooks which can pretty-print Python types (such as PyObject *
) and understand the Python stack. These hooks are loaded automatically if you have the python2.7-dbg
package installed on Ubuntu.
Whatever you're debugging, you should look to see if there are relevant GDB scripts available—useful helpers have been created for many dynamic languages.
Catching the Error
The Python interpreter creates a PyFrameObject
every time it starts executing a Python stack frame. From that frame object, we can get the name of the function being executed. It's stored as a Python object, so we can convert it to a C string using PyString_AsString
, and then stop the interpreter only if it begins executing a function called handle_uncaught_exception
.
The obvious way to catch this would be by creating a GDB breakpoint. A lot of frames are allocated in the process of executing Python code, though. Rather than tediously continue through hundreds of false positives, we can set a conditional breakpoint that'll break on only the frame we care about:
Breakpoint conditions can be pretty complex, but it's worth noting that conditional breakpoints that fire often (like PyEval_EvalFrameEx
) can slow the program down significantly.
Generating the Initial Return Value
Okay, let's see if we can actually fix things during the next request. We resubmit the form. Once again, GDB halts when the app starts generating the internal server error response. While we investigate more, let's disable the breakpoint in order to keep things fast.
What we really want to do here is to let the app finish generating its original return value (the error response) and then to replace that with our own (the correct response). We find the stack frame where get_response
is being evaluated. Once we've jumped to that frame with the up
or frame
command, we can use the finish
command to wait until the currently selected stack frame finishes executing and returns.
Patching the Code
Now that we've gotten the interpreter into the state we want, we can use Python's internals to modify the running state of the application. GDB allows you to make fairly complicated dynamic function invocations, and we'll use lots of that here.
We use the C equivalent of the Python reload
function to reimport the code. We have to also reload the monospace.urls
module so that it picks up the new code in monospace.views
.
One handy trick, which we use to invoke git in the video and curl here, is that you can run shell commands from within GDB.
We've now patched and reloaded the code. Next, let's generate a new response by finding self
and request
from the local variables in this stack frame, and fetch and call its get_response
method.
In the above snippet, we use GDB's set
command to assign values to variables.
Alright, we now have a new response. Remember that we stopped the program right where the original get_response
method returned. The C return value for the Python interpreter is the same as the Python return value. And so, to replace that return value on x86, we just have to store the new return value in a register—$rax
on 64-bit x86— and then allow the execution to continue.
GDB allows you to refer to refer to the values returned by every command you evaluate by number. In this case, we want $5
:
And, like magic, our web request finishes successfully.
GDB is a powerful precision tool. Even if you spend most of your time writing code in a much higher-level language, it can be extremely useful to have it available when you need to investigate subtle bugs or complex issues in running applications.