How local variables work in Python bytecode
Local variables are stored on the stack.
This line tormented me for years. I could spout it in an interview, but I didn’t really know what it meant. I could pull up the diagram showing the stack growing upwards and the heap growing downwards. And I definitely didn’t want to think about what happened if they ever crossed.
But to truly understand? I had to build it myself.
This post explains how local variables work by walking through their mechanics in Memphis, my Python interpreter built in Rust. While it doesn’t match CPython byte-for-byte, the model is simple and maps closely to how most Python runtimes behave under the hood.
If you’ve read my previous posts, you’ll recall I have two implementations: a treewalk interpreter and a bytecode VM. This post focuses on the bytecode VM because it had a steeper learning curve, blew my mind more often, and is closer to CPython.
Let’s begin with some definitions which will be used by our bytecode VM at runtime.
Bytecode VM concepts
CodeObject
: An immutable structure produced by the bytecode compiler and consumed by the VM.
FunctionObject
: A runtime representation of a CodeObject
plus its captured environment. We’ll need this layer for when we introduce free variables, which underpin closures.
Frame
: A runtime representation of a FunctionObject
. It holds the program counter and the evaluation stack, including the space reserved for local variables.
CallStack
: A stack of Frame
objects, which tracks each execution context (i.e each new function call).
What the heap?!
Wait a minute. Are local variables, which so famously live on the stack, stored in the Frame
or the CallStack
? The answer is yes!
Each Frame
contains its own evaluation stack, the bottom portion of which is reserved for local variables. This is what the term “stack-based virtual machine” refers to, and is in contrast to a register-based VM. Both are mechanisms for managing working memory during your program’s execution.
Meanwhile, the CallStack
is one layer of abstraction higher. It holds a new Frame
for each new function which is entered.
Before I go any farther, I want to address the elephant in the room. In addition to the incantation local variables are stored on the stack, I was pretty sure that objects in Python are stored on the heap. What gives?
Both things are true. Local variables are stored on the stack, but most variables in Python are references. The stack holds the pointer to the object (which could an int, string, list, custom object, etc.), while the heap does hold the actual data.
Let’s walk through a short function to see this in action.
The stack in action
Consider the following function. It takes in one parameter and defines one additional local variable. For the scope of this discussion, let’s assume our VM has already called our function, add_useless
.
# A function which adds an unknown number (via local var) to the input
# parameter x, returning the result.
def add_useless(x):
y = 11
return x + y
Remember how we said the bottom portion of the Frame
's evaluation stack is reserved for local variables? We can visualize it like this.
┌────────────────┐
| stack[1] │
├────────────────┤
| stack[0] │
├────────────────┤
│ locals[1] │
├────────────────┤
│ locals[0] │
└────────────────┘
Before we set y = 11
, the stack would hold two reserved spaces for our two local variables, x
and y
.
┌──────────────────┐
│ empty slot for y │
├──────────────────┤
│ value of x │
└──────────────────┘
During the evaluation of y = 11
, the stack briefly looks like this:
┌──────────────────┐
│ 11 │
├──────────────────┤
│ empty slot for y │
├──────────────────┤
│ value of x │
└──────────────────┘
After we set y = 11
, the stack would be:
┌────────────┐
│ 11 (y) │
├────────────┤
│ value of x │
└────────────┘
During the evaluation of x + y
, the stack briefly looks like this:
┌────────────────┐
| 11 │
├────────────────┤
| value of x │
├────────────────┤
│ 11 (y) │
├────────────────┤
│ value of x │
└────────────────┘
After we add x + y
, but before the return the stack would be:
┌────────────────┐
│ value of x + y │
├────────────────┤
│ 11 (y) │
├────────────────┤
│ value of x │
└────────────────┘
Then we return the sum, discard the frame, and hope nothing blows up. (Can you tell I don’t want to explain that part right now?)
If you are screaming at your screen “why is only half of our stack the stack,” I get it! It confused me too that is one piece of contiguous memory, but we reserve the bottom n slots for n local variables.
So how do we know how many local variables to reserve? Because we already compiled the function’s bytecode before running it. Let’s take a look at how that works.
The bytecode in action
Before we look at the bytecode itself, let’s define a few more concepts. These are both fields on the CodeObject
and are populated during the bytecode compilation.
constants
: The list of constants taken directly from the user code. Think: any integer, float, bool, or string. Because CodeObject
s themselves are immutable, they can also appear as constants. In Rust, I represent this with an enum
to allow multiple types in the same list.
varnames
: The list of identifiers for each local variable in a given CodeObject
. In a previous post, I described how varnames
(for locals) differs from names
(for globals).
During compilation, each identifier assigned to (either as a function parameter or a local variable) is pushed into the varnames
list of the current CodeObject
. (We can ignore the global
keyword for now.)
Whenever the compiler encounters a new function definition, it creates a fresh CodeObject
and pushes it onto a code stack. This stack is used only during compilation to track lexical scope, and is not related to the runtime CallStack
. Once a function body is fully compiled, the corresponding CodeObject
is popped off and made available as a constant for its parent CodeObject
.
Here is the bytecode for our add_useless
function. The data in the parentheses annotates what each index refers to. I added this manually from my Memphis output, but you can get a similar output in CPython by running import dis; dis.dis(add_useless)
, which uses the dis module.
LoadConst 0 (11)
StoreFast 1 (y)
LoadFast 0 (x)
LoadFast 1 (y)
Add
ReturnValue
Click here to view this example in the interactive Bytecode Compiler.
We’re indexing into two separate lists here, both of which live inside a CodeObject
. I write Rust, so naturally I visualized it like this:
CodeObject {
constants: [11],
varnames: [x, y],
}
LoadConst
indexes into constants
, while StoreFast
and LoadFast
index into varnames
.
The magic here is there’s no need for the runtime VM to know about the names of the local variables. varnames
is used by the compiler to emit the right indices into the bytecode, but during execution, the VM just deals with stack slots.
Once a runtime Frame
is initialized, the VM will execute each of these instructions and index into its own execution stack.
The End
If this felt a bit dense, I totally get it! It took me several tries to get this working and clean in my Memphis implementation.
I’m hoping to write similar posts covering global variables and free variables, so I hope you’ll stick around for those.
Lastly, I’m experimenting with open office hours this week. If you’re stuck in Python or Rust and want to talk through it live, here’s the booking link. The slots are pay-what-you-want, with zero pressure to tip. I’d love to meet you!