Relax Virtual Machine
This document explains the Relax VM architecture in detail, covering the compilation pipeline from Relax IR to bytecode, the instruction set, the execution model, and the Python-level user interface.
Overview
The end-to-end flow from model to execution is:
Relax IR — a high-level computational graph (
relax.Functioninside anIRModule).Compilation —
tvm.compile()applies the Relax transformation pipeline, then invokesVMCodeGento translate each Relax function into bytecode instructions.Linking — TIR functions are compiled to native kernels (via LLVM, CUDA, etc.); the bytecode, constant pool, and compiled kernels are packaged together into a
VMExecutable.Execution — at runtime, a
VirtualMachineloads the executable, initializes devices and memory allocators, and runs the bytecode.
IRModule (Relax + TIR)
│
▼ relax_pipeline (FuseOps, LegalizeOps, ...)
IRModule (optimized)
│
▼ VMCodeGen
ExecBuilder (bytecode) + IRModule (TIR only)
│ │
│ ▼ tirx.build()
│ runtime.Module (native kernels)
│ │
▼ VMLink ▼
VMExecutable ◄───────── linked together
│
▼ VirtualMachine(exec, device)
Runtime execution
Compilation: From Relax IR to Bytecode
Build entry point
The main entry point is tvm.compile() (which delegates to relax.build() in
python/tvm/relax/vm_build.py):
import tvm
from tvm import relax
@tvm.script.ir_module
class MyModule:
@R.function
def main(x: R.Tensor((3, 4), "float32")):
return R.add(x, x)
target = tvm.target.Target("llvm")
ex = tvm.compile(MyModule, target)
Internally, relax.build() performs these steps:
Apply the Relax pipeline (
relax.get_pipeline("default")), which includes operator legalization, fusion, buffer planning, and other graph-level passes.Create an
ExecBuilderand run VMCodeGen (src/relax/backend/vm/codegen_vm.cc), which walks eachrelax.Functionand emits bytecode instructions. The Relax functions are removed from the IRModule; only TIR functions remain.Compile the remaining TIR functions to native code via
tirx.build().Link the bytecode executable with the compiled native module using
VMLink, producing aVMExecutable.
Two execution modes are supported:
exec_mode="bytecode"(default): Relax functions are interpreted by the VM’s bytecode dispatch loop.exec_mode="compiled": Relax functions are compiled into TIR functions (VMTIRCodeGen) that directly manipulate the register file, bypassing the interpreter loop. This avoids dispatch overhead but produces more code.
Bytecode generation
The CodeGenVM class (src/relax/backend/vm/codegen_vm.cc) is an ExprFunctor that visits
each Relax expression and emits instructions through the ExecBuilder:
Each
relax.Varis mapped to a register.Function parameters occupy registers 0 through N-1.
Each binding in a
SeqExprgenerates one or more instructions; the result is stored in a new register.Function calls (
R.call_tir,R.call_packed, operator calls) becomeCallinstructions.Conditional expressions (
relax.If, written as Pythonifin TVMScript) become anIfinstruction followed byGototo skip branches.The function body ends with a
Retinstruction.
Instruction Set
The VM uses a register-based architecture with an intentionally minimal instruction set. There are only four opcodes:
Opcode |
Fields |
Semantics |
|---|---|---|
|
|
Call function |
|
|
Return the value in register |
|
|
Jump forward or backward by |
|
|
If register |
The VM itself performs no mathematical computation. All actual work — matrix multiplications,
convolutions, elementwise operations — is carried out by compiled TIR kernels or external
libraries (cuBLAS, cuDNN, etc.), dispatched through Call instructions.
Instruction encoding
Each instruction argument (Instruction::Arg) is a 64-bit word encoded as:
Bits [63:56] —
ArgKind(8 bits):kRegister(0),kImmediate(1),kConstIdx(2), orkFuncIdx(3).Bits [55:0] — value (56 bits, sign-extended).
Two special register values exist:
kVoidRegister: indicates “no destination” (the return value is discarded).kVMRegister: refers to the VM context pointer itself, passed as the first argument to closures.
The instruction stream is stored as a flat vector<ExecWord> (instr_data) with an offset
table (instr_offset) for random access.
Executable
A VMExecutable (include/tvm/runtime/vm/executable.h) bundles everything needed for
execution:
Function table (
func_table): avector<VMFuncInfo>describing every function. Each entry records the function’s kind, name, instruction range (start_instrtoend_instr), number of arguments, register file size, and parameter names.Constant pool (
constants): model weights, shape tuples, and other compile-time constants.Bytecode (
instr_data+instr_offset): the instruction stream.Imported modules: the compiled TIR kernels and external libraries.
Function kinds
The VM recognizes three function kinds (VMFuncInfo::FuncKind):
Kind |
Description |
|---|---|
|
An external C/C++ function looked up from imported modules or the global PackedFunc
registry. Examples: |
|
A bytecode-interpreted Relax function. The VM interprets its instructions in |
|
A Relax function compiled to a TIR function ( |
Serialization
The executable supports binary serialization for deployment:
# Save
ex.export_library("model.so")
# Load
loaded = tvm.runtime.load_module("model.so")
vm = relax.VirtualMachine(loaded, tvm.cuda())
The binary format includes a magic number (0xD225DE2F4214151E), a version string
(currently "0.14"), followed by four sections: globals (the function table), memory scopes,
constant pool, and bytecode. AsText() and AsPython() provide human-readable representations
for debugging.
Runtime Execution
VM initialization
At runtime, a VirtualMachine is created and initialized:
from tvm.relax import VirtualMachine
vm = VirtualMachine(exec_module, tvm.cuda())
Under the hood:
LoadExecutable: the bytecode and metadata are loaded from the
VMExecutable.Init: devices and memory allocators are set up. Each device gets an
Allocator(eitherNAIVE_ALLOCATORorPOOLED_ALLOCATOR, defaulting to pooled). A CPU device is always added for shape computations.InitFuncPool: the function pool is populated —
kPackedFuncentries are resolved from imports or the global registry;kVMFuncandkVMTIRFuncentries are wrapped inVMClosureobjects.Constant pool: model constants are loaded and optionally transferred to the target device.
The bytecode dispatch loop
When a kVMFunc is invoked, the VM enters InvokeBytecode():
A new
VMFrameis pushed onto the call stack. Each frame contains:A register file (
vector<ffi::Any>) — type-erased slots that can hold tensors, shapes, closures, or any TVM object. The size is determined at compile time (VMFuncInfo::register_file_size).The return program counter — where to resume after the function returns.
The caller’s return register — which register in the parent frame receives the result.
Function arguments are written to registers 0..N-1.
The program counter (
pc_) is set to the function’sstart_instr.RunLoop()executes instructions until aRetis encountered:Call: resolve arguments (from registers, immediates, constant pool, or function pool), invoke the target function via
InvokeClosurePacked(), store the result indst.Ret: read the return value from the specified register, write the result to the caller’s return register, and return from
RunLoop()(the frame is popped by an RAII guard whenInvokeBytecode()exits).Goto: adjust
pc_by the offset.If: check the condition register; if nonzero, fall through; otherwise jump by
false_offset.
The dispatch loop is implemented in src/runtime/vm/vm.cc (VirtualMachineImpl::RunLoop).
Frame Stack Register File (per frame)
┌─────────────┐ ┌────┬────┬────┬─────┬────┐
│ Frame 2 │ ───────► │ R0 │ R1 │ R2 │ ... │ Rn │
├─────────────┤ └────┴────┴────┴─────┴────┘
│ Frame 1 │ ───────► [register file]
├─────────────┤
│ Frame 0 │ ───────► [register file]
└─────────────┘
VMClosure and function dispatch
Functions in the VM are stored in a func_pool_ indexed by function table position.
kVMFunc and kVMTIRFunc entries are wrapped as VMClosure objects, while kPackedFunc
entries are stored as plain ffi::Function. A VMClosure stores:
func_name: the function’s string name.impl: affi::Functionthat takes the VM context pointer as its first argument, followed by the actual parameters.
When the VM encounters a Call instruction, it looks up the function in func_pool_ by
index and dispatches via InvokeClosurePacked(). If the target is a VMClosure, the VM
pointer is prepended to the arguments and impl is invoked. If it is a plain
ffi::Function, it is called directly.
VMClosure::BindLastArgs enables partial application — it creates a new function with
some arguments pre-bound at the end, useful for implementing captured closures in Relax.
Built-in operations
The VM relies on several built-in PackedFuncs (registered in src/runtime/vm/builtin.cc)
for runtime support:
vm.builtin.alloc_shape_heap: allocate workspace for symbolic shape computations.vm.builtin.match_shape: validate tensor shapes against expected patterns at runtime, supporting assertions (kAssertEqualToImm,kAssertEqualToLoad), storing symbolic dimensions to the shape heap (kStoreToHeap), or no-ops (kNoOp).vm.builtin.make_shape: construct shape tuples from immediates or heap-loaded values.vm.builtin.match_prim_value: validate primitive values (e.g., integers) against expected patterns.vm.builtin.copy: copy a value into a register. Used in several codegen scenarios: materializing non-register arguments (immediates, constants) into registers, ensuring each variable binding gets its own register, and merging results from if/else branches.
Python Interface
Users interact with the VM through tvm.relax.VirtualMachine:
import tvm
from tvm import relax
import numpy as np
# Compile
ex = tvm.compile(MyModule, target="llvm")
# Create VM
vm = relax.VirtualMachine(ex, tvm.cpu())
# Direct invocation
inp = tvm.runtime.tensor(np.random.rand(3, 4).astype("float32"))
result = vm["main"](inp)
# Stateful interface (useful for RPC)
vm.set_input("main", inp)
vm.invoke_stateful("main")
output = vm.get_outputs("main")
Key methods:
vm["func_name"](*args)— direct invocation, returns the result.vm.set_input()/vm.invoke_stateful()/vm.get_outputs()— stateful interface that avoids sending output over the wire, useful for RPC-based remote execution.vm.save_function(func_name, saved_name, *args)— pre-bind arguments for repeated calls, reducing dictionary lookup overhead during benchmarking.vm.time_evaluator(func_name, dev)— returns a timing function following the same convention astvm.runtime.Module.time_evaluator.vm.set_instrument(func)— register an instrumentation callback that is invoked before/after everyCallinstruction. The callback can returnVMInstrumentReturnKind.SKIP_RUNto skip the call.
Instrumentation
The VM supports observability via instrumentation:
Instrumentation via set_instrument():
def my_instrument(func, func_symbol, before_run, ret_value, *args):
if before_run:
print(f"About to call: {func_symbol}")
return VMInstrumentReturnKind.NO_OP
vm.set_instrument(my_instrument)
vm["main"](inp)
The instrument function is called before and after every Call instruction, receiving the
function object, its symbol name, a flag indicating before/after, the return value (only valid
after), and all arguments.
Inspecting Bytecode
The executable provides text and Python representations of the compiled bytecode:
ex = tvm.compile(MyModule, target="llvm")
print(ex.as_text()) # Human-readable instruction listing
print(ex.as_python()) # Equivalent Python program
print(ex.stats()) # Summary statistics
These are invaluable for debugging compilation issues — they show exactly which functions are called, in what order, and how registers are used.
Source Code Map
Path |
Contents |
|---|---|
|
Instruction, Opcode, and Arg definitions |
|
VMExecutable, VMFuncInfo, serialization |
|
VirtualMachine base class, VMClosure |
|
VirtualMachineImpl, RunLoop, InvokeBytecode |
|
Serialization/deserialization, text output |
|
Built-in operations (shape matching, allocation) |
|
CodeGenVM: Relax IR → bytecode |
|
VMTIRCodeGen: Relax IR → compiled TIR |
|
Python VirtualMachine wrapper |
|
|