Last week at Hacker School I did a quick presentation on python bytecode and the dis
module. The disassembler is a very powerful tool with a gentle learning curve – that is, you can get a fair amount out of it without really knowing much about what’s going on. This post is a quick introduction to how and why you should use it.
What’s bytecode?
Bytecode is the internal representation of a python program in the compiler. Here, we’ll be looking at bytecode from cpython, the default compiler. If you don’t know what compiler you’re using, it’s probably cpython.
How do I get bytecode?
You already have it! Bytecode is what’s contained in those .pyc files you see when you import a module. It’s also created on the fly by running any python code.
Disassembling
Ok, so you have some bytecode, and you want to understand it. Let’s look at it without using the dis
module first.
1 2 3 4 5 6 7 8 9 10 11 |
|
Hmm, that was … not very enlightening. We can see that we have a bunch of bytes (some printable, others not), but we have no idea what they mean.
Let’s run it through dis.dis
instead.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Now this starts to make some sense. dis
takes each byte, finds the opcode that corresponds to it in opcodes.py
, and prints it as a nice, readable constant. If we look at opcodes.py
we see that LOAD_CONST
is 100, STORE_FAST
is 125, etc. dis
also shows the line numbers on the left and the values or names on the right. So without ever seeing something like before, we have an idea what’s going on: we first load a constant, 2, then somehow store it as a
. Then we repeat this with 3 and b
. We load a
and b
back up, do BINARY_ADD
, which presumably adds the numbers, and then do RETURN_VALUE
.
Examining the bytecode can sometimes increase your understanding of python code. Here is one example.
elif
elif
is identical in bytecode to else ... if
. Take a look:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
We’ve read PEP 8 so we know that flat is better than nested for style and readability. But is there a performance difference? Not at all – in fact, these two functions have identical bytecode.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
That makes sense – else
just means “start executing here if the if
was false” – there’s no more computation to do. elif
is just syntactic sugar.
Further reading:
This just scratches the surface of what’s interesting about python bytecode.
If you enjoyed this, you might enjoy diving into Yaniv Aknin’s series on python internals. If you’re excited about bytecode, you should contribute to Ned Batchelder’s byterun.