ImplementationSyntax

Overview

Single-pass parser: consumes the lexer token stream and emits bytecode directly into an SSAChunk, no intermediate AST. Each construct is parsed and lowered in one traversal. Also handles SSA versioning, phi-node insertion at control-flow joins, and structural diagnostics. Lex-time errors merge into the parser’s diagnostic stream for a single ordered report.

Expression parsing uses Pratt: each operator declares left/right binding power, and expr_bp(min_bp) pulls in everything bound at least as tightly as min_bp.

Bytecode model

Each instruction is a tagged 4-byte record:

pub struct Instruction {
    pub opcode: OpCode, // 1 byte (with #[repr(u8)] planned)
    pub operand: u16, // 2 bytes
}

The operand is a 16-bit slot, its meaning depends on the opcode. Common shapes:

OpCodeOperand interpretation
LoadConstconstant pool index
LoadName / StoreNamename slot index
Add, Sub, …unused (IC keyed by ip)
Call(num_kw << 8) | num_pos
BuildList / BuildTuple / BuildSetelement count
BuildDictkey-value pair count
BuildSliceparts count (2 or 3)
Jump / JumpIfFalsetarget instruction index
ForIterjump target on iterator exhaustion
Phitarget slot; sources stored in chunk.phi_sources
UnpackEx(before << 8) | after
MakeFunctionfunction index in chunk.functions

Operands and the constant pool, name table, and instruction stream per chunk are all capped at u16::MAX (65,535).

Expression parsing

expr_bp(min_bp) runs the Pratt loop. parse_atom advances one token and routes by kind:

Name   -> name() (handles assignment, walrus, calls)
String   -> emit Str constant; concatenate adjacent String tokens
Int / Float   -> emit numeric constant; literal beyond 2⁴⁷ is a parse error
True/False/None/Ellipsisv-> emit dedicated load opcode
FstringStart   -> fstring()
Lbrace   -> brace_literal()  (dict, set, comprehension)
Lsqb   -> list_literal()   (list, comprehension)
Lpar   -> grouped expr, tuple, generator, or empty tuple
Lambda   -> parse_lambda()

After an atom, postfix_tail() handles trailers, subscript, attribute, call, iterating until none apply. Lets fns[0](-3), obj.method(), (lambda x: x)(3), compose(f, g)(x) parse uniformly.

*args / **kwargs are accepted in call position only, starred unpacking inside literals ([*a, *b], {**d1, **d2}, (1, *xs, 2)) is not supported.

Operator precedence

Each binary operator declares (l_bp, r_bp, OpCode) in binding_power. Higher binding pulls tighter. Only ** is right-associative (r_bp < l_bp); everything else is left-associative.

LevelOperatorsNotes
1/2orshort-circuit
3/4andshort-circuit
5unary notprefix only
7/8== != < > <= >= in not in is is notchainable
9/10|bitwise
11/12^bitwise
13/14&bitwise
15/16<< >>shifts
17/18+ -additive
19/20* / % //multiplicative
21unary - ~ awaitprefix
22/21**right-associative

Comparison chaining (a < b < c) is handled by infix_bp: when a comparison opcode is followed by another comparison token, the middle value is stored in a synthetic __cmp__N slot, the first comparison emitted, short-circuited on false, and the stored value reused for the next.

Short-circuit lowering

and / or lower to JumpIfFalseOrPop / JumpIfTrueOrPop, superinstructions that peek the stack top, pop only if execution continues, otherwise jump while leaving the value on the stack:

a and b

LoadName a
JumpIfFalseOrPop  ->  end
LoadName b
end:

Preserves operand identity (returns the actual value, not a coerced bool) without an extra opcode.

SSA versioning

Each binding emits a fresh slot with an incremented version. The parser keeps a HashMap<String, u32> of name -> current version. Names in chunk.names are stored as name_version:

x = 1 # x_1
x = 2 # x_2
y = x # y_1, references x_2
chunk.names = ["x_1", "x_2", "y_1"]
chunk.instructions:
   LoadConst 0   (1)
   StoreName 0   (x_1)
   LoadConst 1   (2)
   StoreName 1   (x_2)
   LoadName 1   (x_2)
   StoreName 2   (y_1)

Undefined names target version 0 (x_0), filled by the host before execution (VM seeds globals like print_0). Still unbound at load time -> NameError.

Phi nodes at joins

At each control-flow boundary the parser pushes a JoinNode { backup, then } onto a stack:

enter_block() -> snapshot current versions into JoinNode.backup (and reset to the same baseline for the if branch).
mid_block() -> snapshot post-then versions into JoinNode.then; restore baseline (max of backup, then-state) for else.
commit_block() -> diff (then ∪ post) against (backup), emit Phi for each name that diverged.

Each Phi carries the target slot (new version after join) in its operand; source slots live in chunk.phi_sources, indexed by chunk.phi_map[ip] at runtime. Keeps Instruction at 4 bytes.

if cond:
    x = 1
else:
    x = 2
print(x)
LoadName cond_0
JumpIfFalse else_label
LoadConst 0 *(1)
StoreName x_1
Jump end_label
else_label:
LoadConst 1 *(2)
StoreName x_2
end_label:
Phi x_3 *(sources: x_1, x_2)
LoadName x_3
CallPrint 1

Runtime resolves Phi by reading whichever source slot is Some, exactly one branch executed.

Statement dispatch

stmt() peeks the leading token and routes:

if   -> if_stmt   (with elif chain, optional else)
for   -> for_stmt_inner   (sync iter, optional else)
while   -> while_stmt   (with break/continue patches)
match   -> match_stmt
def   -> func_def_inner
class   -> class_def   (__init__, attributes, methods)
with   -> with_stmt_inner   (multi-target, async variant)
try   -> try_stmt   (except, else, finally, raise)
import   -> import_stmt   (compile-time resolver lookup)
from   -> parse_from_stmt   (named / star imports, same path)
type -> type-alias declaration
yield   -> yield expr / yield from
async   -> async def / for / with
@   -> decorator stack + def or class   (peeks `class` after the @-list)
return   -> expr + ReturnValue
raise   -> expr + Raise / RaiseFrom
break   -> emits Jump, back-patched to the loop exit
continue   -> jump to current loop_start
del / global / nonlocal / pass -> direct emit
assert   -> Assert opcode; the `, msg` form lowers to a conditional raise of AssertionError(msg)
Name   -> name_stmt   (assignment, augmented, indexed, attribute, call)

Each statement returns a bool: did it leave a value on the stack. Driver emits PopTop after expression-shaped statements (x.method(), 1 + 2 at module level), not after statement-shaped (assignment, control flow).

Decorators apply to def and class; the @ arm peeks for class after the decorator stack. Each decorator wraps via Call,1 between MakeFunction/MakeClass and the final StoreName.

raise X from Y lowers to RaiseFrom, pops cause then exception so X surfaces. Bare raise re-raises. __cause__ / __context__ not exposed. except matching walks EXC_PARENTS so except Exception catches subclasses like isinstance(e, Exception) does.

Lambda and function bodies

Lambdas and def both compile their body into a fresh SSAChunk:

self.with_fresh_chunk(|s| {
    s.ssa_versions = outer_versions.clone();
    for p in &params { s.ssa_versions.insert(p.clone(), 0); }
    s.expr(); // or compile_block_body for def
    s.chunk.emit(OpCode::ReturnValue, 0);
});

Free variables (non-parameters with no local binding) are looked up in the outer chunk. MakeFunction captures matching slots from the enclosing scope into captures (snapshotted, no cell objects). Nested def/lambda push their free names back into the parent’s name table, capture propagates through any depth (A -> B -> C where C references a var in A).

Parameter slots: Normal, Star (*args), DoubleStar (**kwargs). Lone * separator marks following params as keyword-only. Defaults live in HeapObj::Func.defaults and apply to the last-N positional slots. Annotations (x: T, -> T) parse and drain to chunk.annotations (tooling-only).

compile_body checks impurity opcodes (StoreItem, StoreAttr, CallPrint, CallInput, Global, Nonlocal, Import, Raise, Yield, LoadAttr) to set body.is_pure — the flag that gates template memoisation (Design).

Type annotations

Parsed for source compatibility, discarded at runtime:

counter: int = 0   # annotation 'int' parsed and stored, slot still gets 0
def f(x: int) -> int:   # annotations on params and return parsed and skipped
    return x

Recorded in chunk.annotations: HashMap<String, String> for tooling; no code emitted. f.__annotations__ is not exposed at runtime, but f.__name__ is (resolved in resolve_attr, also on type objects and classes).

Comprehensions and generators

List/set/dict comprehensions support multi-for and multi-if. Lower to BuildList / BuildSet / BuildDict plus a loop scaffold using ListAppend / SetAdd / MapAdd.

Generator expressions (i*2 for i in xs) lower eagerly to BuildList, operationally equivalent to [i*2 for i in xs]. Deliberate: template memoisation needs hashable, finite args; lazy generators wouldn’t memoise. For unbounded streams, write a def with yield (real HeapObj::Coroutine).

Async comprehensions ([x async for x in y]) and starred-unpack inside comprehensions unsupported.

F-string lowering

f"hello {name}, age {age}"

Lowers to:

LoadConst "hello "
LoadName  name_v
FormatValue 0
LoadConst ", age "
LoadName  age_v
FormatValue 0
BuildString 5

FormatValue’s 16-bit operand is a small flags field:

  • bit 0, set when a format spec string is on the stack just below the value (collected as the raw text between : and } and emitted as a constant).
  • bits 1,2, conversion: 0 none, 1 !r, 2 !s, 3 !a.

VM applies conversion first, then the spec mini-language [[fill]align][sign][#][0][width][,][.precision][type] with type chars s d b o x X f F e E g G n % c. n aliases d (no locale). = self-documenting form ({expr=}) emits a literal expr= prefix. Adjacent string literals concatenate at parse time. Spec parse failures -> ValueError at runtime.

Limits

ConstantValuePurpose
MAX_EXPR_DEPTH200Cap on recursive expression parsing
MAX_INSTRUCTIONS65,535Cap on instructions per chunk

MAX_EXPR_DEPTH -> diagnostic (“expression too deeply nested”). MAX_INSTRUCTIONS -> sets chunk.overflow = true, reported at end of parsing; instruction stream cleared rather than dispatched.

References

  • Pratt. Top Down Operator Precedence (POPL 1973). Precedence climbing.
  • Cytron et al. Efficiently Computing Static Single Assignment Form (TOPLAS 1991). SSA, phi-nodes.
  • Crafting Interpreters, by Robert Nystrom: craftinginterpreters.com, single-pass codegen patterns.
  • Casey et al. Towards Superinstructions for Java Interpreters (SCOPES 2003). LoadAttr+Call fusion.