[Edit: A significantly expanded version of this series appears as a chapter in The Architecture of Open Source Applications, volume 4, as A Python Interpreter Written in Python.]
When we left our heroes, they had come across some odd-looking output:
This is python bytecode.
You recall from Part 2 that “python bytecode” and “a python code object” are not the same thing: the bytecode is an attribute of the code object, among many other attributes. Bytecode is found in the
co_code attribute of the code object, and contains instructions for the interpreter.
So what is bytecode? Well, it’s just a series of bytes. They look wacky when we print them because some bytes are printable and others aren’t, so let’s take the
ord of each byte to see that they’re just numbers.
Here are the bytes that make up python bytecode. The interpreter will loop through each byte, look up what it should do for each one, and then do that thing. Notice that the bytecode itself doesn’t include any python objects, or references to objects, or anything like that.
One way to understand python bytecode would be to find the CPython interpreter file (it’s
ceval.c), and flip through it looking up what
100 means, then
0, and so on. We’ll do this later in the series! For now, there’s a simpler way: the
Disassembling bytecode means taking this series of bytes and printing out something we humans can understand. It’s not a step in python execution; the
dis module just helps us understand an intermediate state of python internals. I can’t think of a reason why you’d ever want to use
dis in production code – it’s for humans, not for machines.
Today, however, taking some bytecode and making it human-readable is exactly what we’re trying to do, so
dis is a great tool. We’ll use the function
dis.dis to analyze the code object of our function
1 2 3 4 5 6 7 8 9 10 11 12 13
(You usually see this called as
dis.dis(foo), directly on the function object. That’s just a convenience:
dis is really analyzing the code object. If it’s passed a function, it just gets its code object.)
The numbers in the left-hand column are line numbers in the original source code. The second column is the offset into the bytecode:
LOAD_CONST appears at position 0,
STORE_FAST at position 3, and so on. The middle column shows the names of bytes. These names are just for our (human) benefit – the interpreter doesn’t need the names.
The last two columns give details about the instructions’s argument, if there is an argument. The fourth column shows the argument itself, which represents an index into other attributes of the code object. In the example,
LOAD_CONST’s argument is an index into the list
STORE_FAST’s argument is an index into
co_varnames. Finally, in the fifth column,
dis has looked up the constants or names in the place the fourth column specified and told us what it found there. We can easily verify this:
1 2 3 4
This also explains why the second instruction,
STORE_FAST, is found at bytecode position 3. If a bytecode has an argument, the next two bytes are that argument. It’s the interpreter’s job to handle this correctly.
(You may be surprised that
BINARY_ADD doesn’t have arguments. We’ll come back to this in a future installment, when we get to the interpreter itself.)
People often say that
dis is a disassembler of python bytecode. This is true enough – the
dis module’s docs say it – but
dis knows about more than just the bytecode, too: it uses the whole code object to give us an understandable printout. The middle three columns show information actually encoded in the bytecode, while the first and the last columns show other information. Again, the bytecode itself is really limited: it’s just a series of numbers, and things like names and constants are not a part of it.
How does the
dis module get from bytes like
100 to names like
LOAD_CONST and back? Try to think of a way you’d do it. If you thought “Well, you could have a list that has the byte names in the right order,” or you thought, “I guess you could have a dictionary where the names are the keys and the byte values are the values,” then congratulations! That’s exactly what’s going on. The file
opcode.py defines the list and the dictionary. It’s full of lines like these (
def_op inserts the mapping in both the list and the dictionary):
1 2 3 4
There’s even a friendly comment telling us what each byte’s argument means.
Ok, now we understand what python bytecode is (and isn’t), and how to use
dis to make sense of it. In Part 4, we’ll look at another example to see how Python can compile down to bytecode but still be a dynamic language.