Overview
Single-pass parser: consumes the lexer token stream and emits bytecode directly into an SSAChunk, no intermediate AST. Each construct is parsed and lowered in one traversal. Also handles SSA versioning, phi-node insertion at control-flow joins, and structural diagnostics. Lex-time errors merge into the parser’s diagnostic stream for a single ordered report.
Expression parsing uses Pratt: each operator declares left/right binding power, and expr_bp(min_bp) pulls in everything bound at least as tightly as min_bp.
Bytecode model
Each instruction is a tagged 4-byte record:
pub struct Instruction {
pub opcode: OpCode, // 1 byte (with #[repr(u8)] planned)
pub operand: u16, // 2 bytes
}The operand is a 16-bit slot, its meaning depends on the opcode. Common shapes:
| OpCode | Operand interpretation |
|---|---|
LoadConst | constant pool index |
LoadName / StoreName | name slot index |
Add, Sub, … | unused (IC keyed by ip) |
Call | (num_kw << 8) | num_pos |
BuildList / BuildTuple / BuildSet | element count |
BuildDict | key-value pair count |
BuildSlice | parts count (2 or 3) |
Jump / JumpIfFalse | target instruction index |
ForIter | jump target on iterator exhaustion |
Phi | target slot; sources stored in chunk.phi_sources |
UnpackEx | (before << 8) | after |
MakeFunction | function index in chunk.functions |
Operands and the constant pool, name table, and instruction stream per chunk are all capped at u16::MAX (65,535).
Expression parsing
expr_bp(min_bp) runs the Pratt loop. parse_atom advances one token and routes by kind:
Name -> name() (handles assignment, walrus, calls)
String -> emit Str constant; concatenate adjacent String tokens
Int / Float -> emit numeric constant; literal beyond 2⁴⁷ is a parse error
True/False/None/Ellipsisv-> emit dedicated load opcode
FstringStart -> fstring()
Lbrace -> brace_literal() (dict, set, comprehension)
Lsqb -> list_literal() (list, comprehension)
Lpar -> grouped expr, tuple, generator, or empty tuple
Lambda -> parse_lambda()After an atom, postfix_tail() handles trailers, subscript, attribute, call, iterating until none apply. Lets fns[0](-3), obj.method(), (lambda x: x)(3), compose(f, g)(x) parse uniformly.
*args / **kwargs are accepted in call position only, starred unpacking inside literals ([*a, *b], {**d1, **d2}, (1, *xs, 2)) is not supported.
Operator precedence
Each binary operator declares (l_bp, r_bp, OpCode) in binding_power. Higher binding pulls tighter. Only ** is right-associative (r_bp < l_bp); everything else is left-associative.
| Level | Operators | Notes |
|---|---|---|
| 1/2 | or | short-circuit |
| 3/4 | and | short-circuit |
| 5 | unary not | prefix only |
| 7/8 | == != < > <= >= in not in is is not | chainable |
| 9/10 | | | bitwise |
| 11/12 | ^ | bitwise |
| 13/14 | & | bitwise |
| 15/16 | << >> | shifts |
| 17/18 | + - | additive |
| 19/20 | * / % // | multiplicative |
| 21 | unary - ~ await | prefix |
| 22/21 | ** | right-associative |
Comparison chaining (a < b < c) is handled by infix_bp: when a comparison opcode is followed by another comparison token, the middle value is stored in a synthetic __cmp__N slot, the first comparison emitted, short-circuited on false, and the stored value reused for the next.
Short-circuit lowering
and / or lower to JumpIfFalseOrPop / JumpIfTrueOrPop, superinstructions that peek the stack top, pop only if execution continues, otherwise jump while leaving the value on the stack:
a and b
LoadName a
JumpIfFalseOrPop -> end
LoadName b
end:Preserves operand identity (returns the actual value, not a coerced bool) without an extra opcode.
SSA versioning
Each binding emits a fresh slot with an incremented version. The parser keeps a HashMap<String, u32> of name -> current version. Names in chunk.names are stored as name_version:
x = 1 # x_1
x = 2 # x_2
y = x # y_1, references x_2chunk.names = ["x_1", "x_2", "y_1"]
chunk.instructions:
LoadConst 0 (1)
StoreName 0 (x_1)
LoadConst 1 (2)
StoreName 1 (x_2)
LoadName 1 (x_2)
StoreName 2 (y_1)Undefined names target version 0 (x_0), filled by the host before execution (VM seeds globals like print_0). Still unbound at load time -> NameError.
Phi nodes at joins
At each control-flow boundary the parser pushes a JoinNode { backup, then } onto a stack:
enter_block() -> snapshot current versions into JoinNode.backup (and reset to the same baseline for the if branch).
mid_block() -> snapshot post-then versions into JoinNode.then; restore baseline (max of backup, then-state) for else.
commit_block() -> diff (then ∪ post) against (backup), emit Phi for each name that diverged.Each Phi carries the target slot (new version after join) in its operand; source slots live in chunk.phi_sources, indexed by chunk.phi_map[ip] at runtime. Keeps Instruction at 4 bytes.
if cond:
x = 1
else:
x = 2
print(x)LoadName cond_0
JumpIfFalse else_label
LoadConst 0 *(1)
StoreName x_1
Jump end_label
else_label:
LoadConst 1 *(2)
StoreName x_2
end_label:
Phi x_3 *(sources: x_1, x_2)
LoadName x_3
CallPrint 1Runtime resolves Phi by reading whichever source slot is Some, exactly one branch executed.
Statement dispatch
stmt() peeks the leading token and routes:
if -> if_stmt (with elif chain, optional else)
for -> for_stmt_inner (sync iter, optional else)
while -> while_stmt (with break/continue patches)
match -> match_stmt
def -> func_def_inner
class -> class_def (__init__, attributes, methods)
with -> with_stmt_inner (multi-target, async variant)
try -> try_stmt (except, else, finally, raise)
import -> import_stmt (compile-time resolver lookup)
from -> parse_from_stmt (named / star imports, same path)
type -> type-alias declaration
yield -> yield expr / yield from
async -> async def / for / with
@ -> decorator stack + def or class (peeks `class` after the @-list)
return -> expr + ReturnValue
raise -> expr + Raise / RaiseFrom
break -> emits Jump, back-patched to the loop exit
continue -> jump to current loop_start
del / global / nonlocal / pass -> direct emit
assert -> Assert opcode; the `, msg` form lowers to a conditional raise of AssertionError(msg)
Name -> name_stmt (assignment, augmented, indexed, attribute, call)Each statement returns a bool: did it leave a value on the stack. Driver emits PopTop after expression-shaped statements (x.method(), 1 + 2 at module level), not after statement-shaped (assignment, control flow).
Decorators apply to def and class; the @ arm peeks for class after the decorator stack. Each decorator wraps via Call,1 between MakeFunction/MakeClass and the final StoreName.
raise X from Y lowers to RaiseFrom, pops cause then exception so X surfaces. Bare raise re-raises. __cause__ / __context__ not exposed. except matching walks EXC_PARENTS so except Exception catches subclasses like isinstance(e, Exception) does.
Lambda and function bodies
Lambdas and def both compile their body into a fresh SSAChunk:
self.with_fresh_chunk(|s| {
s.ssa_versions = outer_versions.clone();
for p in ¶ms { s.ssa_versions.insert(p.clone(), 0); }
s.expr(); // or compile_block_body for def
s.chunk.emit(OpCode::ReturnValue, 0);
});Free variables (non-parameters with no local binding) are looked up in the outer chunk. MakeFunction captures matching slots from the enclosing scope into captures (snapshotted, no cell objects). Nested def/lambda push their free names back into the parent’s name table, capture propagates through any depth (A -> B -> C where C references a var in A).
Parameter slots: Normal, Star (*args), DoubleStar (**kwargs). Lone * separator marks following params as keyword-only. Defaults live in HeapObj::Func.defaults and apply to the last-N positional slots. Annotations (x: T, -> T) parse and drain to chunk.annotations (tooling-only).
compile_body checks impurity opcodes (StoreItem, StoreAttr, CallPrint, CallInput, Global, Nonlocal, Import, Raise, Yield, LoadAttr) to set body.is_pure — the flag that gates template memoisation (Design).
Type annotations
Parsed for source compatibility, discarded at runtime:
counter: int = 0 # annotation 'int' parsed and stored, slot still gets 0
def f(x: int) -> int: # annotations on params and return parsed and skipped
return xRecorded in chunk.annotations: HashMap<String, String> for tooling; no code emitted. f.__annotations__ is not exposed at runtime, but f.__name__ is (resolved in resolve_attr, also on type objects and classes).
Comprehensions and generators
List/set/dict comprehensions support multi-for and multi-if. Lower to BuildList / BuildSet / BuildDict plus a loop scaffold using ListAppend / SetAdd / MapAdd.
Generator expressions (i*2 for i in xs) lower eagerly to BuildList, operationally equivalent to [i*2 for i in xs]. Deliberate: template memoisation needs hashable, finite args; lazy generators wouldn’t memoise. For unbounded streams, write a def with yield (real HeapObj::Coroutine).
Async comprehensions ([x async for x in y]) and starred-unpack inside comprehensions unsupported.
F-string lowering
f"hello {name}, age {age}"Lowers to:
LoadConst "hello "
LoadName name_v
FormatValue 0
LoadConst ", age "
LoadName age_v
FormatValue 0
BuildString 5FormatValue’s 16-bit operand is a small flags field:
- bit 0, set when a format spec string is on the stack just below the value (collected as the raw text between
:and}and emitted as a constant). - bits 1,2, conversion:
0none,1!r,2!s,3!a.
VM applies conversion first, then the spec mini-language [[fill]align][sign][#][0][width][,][.precision][type] with type chars s d b o x X f F e E g G n % c. n aliases d (no locale). = self-documenting form ({expr=}) emits a literal expr= prefix. Adjacent string literals concatenate at parse time. Spec parse failures -> ValueError at runtime.
Limits
| Constant | Value | Purpose |
|---|---|---|
MAX_EXPR_DEPTH | 200 | Cap on recursive expression parsing |
MAX_INSTRUCTIONS | 65,535 | Cap on instructions per chunk |
MAX_EXPR_DEPTH -> diagnostic (“expression too deeply nested”). MAX_INSTRUCTIONS -> sets chunk.overflow = true, reported at end of parsing; instruction stream cleared rather than dispatched.
References
- Pratt. Top Down Operator Precedence (POPL 1973). Precedence climbing.
- Cytron et al. Efficiently Computing Static Single Assignment Form (TOPLAS 1991). SSA, phi-nodes.
- Crafting Interpreters, by Robert Nystrom: craftinginterpreters.com, single-pass codegen patterns.
- Casey et al. Towards Superinstructions for Java Interpreters (SCOPES 2003). LoadAttr+Call fusion.