Structural Equality and Hashing#
TVM FFI provides structural_equal and structural_hash for the
object graph. These compare objects by content — recursively walking
fields — rather than by pointer identity.
The behavior is controlled by two layers of annotation on
py_class():
Type-level
structural_eq=— what role does this type play in the IR graph?Field-level
structural_eq=onfield()— should this field be skipped, or does it introduce new variable bindings?
This document explains what each annotation means, when to use it, and how they compose.
Note
Structural equality and hashing never call Python-level __eq__
or __hash__. structural_equal / structural_hash dispatch
entirely through a C++ walker driven by the kind metadata registered
via structural_eq=; the Python a == b / hash(a) dunders
are independent (they default to pointer identity and handle address,
inherited from Object). To customize how a specific type
participates in structural comparison, register the
Custom Equality and Hashing: __s_equal__ / __s_hash__ hooks described below — do not override
__eq__ or __hash__.
Type-Level Annotation#
The structural_eq parameter on @py_class declares how instances of the
type participate in structural equality and hashing:
@py_class(structural_eq="tree")
class Expr(Object):
...
Quick reference#
|
Meaning |
Use when… |
|---|---|---|
|
A regular IR node |
Default for most IR nodes |
|
An immutable value node (with pointer shortcut) |
The type has no transitive |
|
A node in a dataflow graph |
Pointer sharing is semantically meaningful |
|
A bound variable |
The type represents a variable binding |
|
A singleton |
Exactly one instance per logical identity (e.g. registry entries) |
|
Not comparable |
The type should never be compared structurally |
"tree" — The Default#
@py_class(structural_eq="tree")
class Add(Object):
lhs: Expr
rhs: Expr
Meaning: “This node is defined by its fields. Two nodes are equal if and only if all their fields are recursively equal.”
This is the right choice for the vast majority of IR nodes: expressions, statements, types, attributes, buffers, etc.
Example.
1 + 2 vs 1 + 2 → Equal
1 + 2 vs 1 + 3 → Not equal (rhs differs)
"const-tree" — Tree with a Fast Path#
@py_class(structural_eq="const-tree")
class DeviceMesh(Object):
shape: list[int]
device_ids: list[int]
Meaning: “Same as "tree", but if two references point to the same
object, they are guaranteed equal — skip the field comparison.”
This is purely a performance optimization. The only behavioral difference
from "tree" is that pointer identity short-circuits to True.
When is this safe (and worth it)?#
Three conditions decide whether "const-tree" is the right choice:
Immutable — content doesn’t change after construction, so same-pointer always implies same-content.
No transitive
"var"children — skipping field traversal won’t cause variable mappings to be missed (see "var" — Bound Variables for why this matters).Sharing is common — instances are interned or canonicalized, so the same pointer actually appears on both sides of real comparisons. Without interning, the shortcut never fires and
"const-tree"behaves like"tree"with a dead branch.
Conditions 1 and 2 are correctness requirements: violating them is a bug,
not a performance regression. Condition 3 is the payoff — "const-tree"
is worth reaching for only when it will actually save work.
A useful rule of thumb: does the system go out of its way to make two
equal instances of this type share a pointer? Canonical types, interned
constants, cached shapes, and op metadata usually do. General expression
and statement nodes usually don’t — and also fail condition 2. Prefer
"const-tree" for the type / attribute / metadata layer of the IR, not
the expression / statement layer.
Note also that condition 2 is a whole-subgraph property: once a field
holds an Expr (which may one day contain a Var), the annotation
silently commits the type to that invariant — a later refactor embedding
a Var becomes a correctness break rather than a local change.
Why not use it everywhere?#
Most IR nodes are immutable, but many transitively contain variables
(e.g., x + 1 contains the "var" node x). The pointer shortcut
fires only when both sides of a comparison reference the same object —
but when that sharing exists, skipping traversal also skips the variable
occurrences inside, and mappings that should have been recorded are
silently missed.
Suppose the + node were incorrectly annotated as "const-tree", and
consider comparing two tuples that share the + subtree via pointer
identity:
shared = x + 1 # pointer P, contains var x
lhs = (shared, x) # .0 = P, .1 = var x
rhs = (shared, y) # .0 = P, .1 = var y (different Var)
structural_equal(lhs, rhs, map_free_vars=True)
With the + annotated as plain "tree" (correct):
.0: traverse intosharedon both sides, visitxat.lhs, record the mappingx ↔ x..1: look upx→ maps tox, but rhs isy. NOT EQUAL ✓
With the + annotated as "const-tree" (the bug):
.0: pointer shortcut fires onshared(both sides reference P). Fields are skipped,xinside is never visited, no mapping is recorded..1: comparexvsy. No existing mapping, andmap_free_vars=Truelets a new one be recorded asx ↔ y. EQUAL ✗ (wrong)
The following diagram illustrates the shared structure. The + node
(shared) has two incoming .0 edges — one from each side — which
is exactly the situation in which the pointer shortcut fires:
graph TD
LT["lhs: (_, _)"]
RT["rhs: (_, _)"]
ADD["shared = x + 1<br/>const-tree<br/><i>same pointer on both sides</i>"]
X["x : var"]
ONE["1"]
Y["y : var"]
LT -->|".0"| ADD
RT -->|".0"| ADD
LT -->|".1"| X
RT -->|".1"| Y
ADD -->|".lhs"| X
ADD -->|".rhs"| ONE
style ADD fill:#fff3cd
style X fill:#f8d7da
style Y fill:#f8d7da
The same failure mode arises whenever a shared subtree containing a
"var" is compared inside any definition region (e.g., the body of a
Lambda whose params field is structural_eq="def"), not only under
map_free_vars=True.
"dag" — Sharing-Aware Comparison#
@py_class(structural_eq="dag")
class Binding(Object):
var: Var
value: Expr
Meaning: “This node lives in a graph where pointer sharing is semantically meaningful. Two graphs are equal only if they have the same content and the same sharing structure.”
Why it exists#
In dataflow IR, sharing matters. Consider:
# Program A: shared — compute once, use twice
let s = x + 1 in (s, s)
# Program B: independent — compute twice
(x + 1, x + 1)
Program A computes x + 1 once and references it twice; Program B
computes it independently twice. Under "tree" these are equal;
under "dag" they are not:
graph TD
subgraph "Program A — DAG"
TA["(_, _)"]
SA["s = x + 1"]
TA -->|".0"| SA
TA -->|".1"| SA
end
subgraph "Program B — Tree"
TB["(_, _)"]
A1["x + 1"]
A2["x + 1"]
TB -->|".0"| A1
TB -->|".1"| A2
end
SA -. "NOT EQUAL under dag<br/>(sharing structure differs)" .-> A1
style SA fill:#d4edda
style A1 fill:#d4edda
style A2 fill:#f8d7da
How "dag" detects sharing#
"dag" maintains a bijective (one-to-one) mapping between objects that
have been successfully compared. When the same object appears again, it
checks whether the pairing is consistent:
Comparing Program A vs Program B:
.0: s ↔ (x+1)₁ → content equal, record pairing: s ↔ (x+1)₁
.1: s ↔ (x+1)₂ → s already paired with (x+1)₁, not (x+1)₂
→ NOT EQUAL
The mapping is bijective: if a is paired with b, no other object
can pair with either a or b. This prevents false positives in both
directions.
Example of the reverse direction.
lhs: (a, b) rhs: (a, a) where a ≅ b (same content)
.0: a₁ ↔ a₂ → equal, record a₁ ↔ a₂
.1: b₁ ↔ a₂ → b₁ is new, but a₂ already paired with a₁
→ NOT EQUAL
Without the reverse check, the second comparison would proceed to content
comparison, find b₁ ≅ a₂, and incorrectly succeed.
Full comparison: "tree" vs "dag"#
Scenario |
|
|
|---|---|---|
both trees with same content |
Equal |
Equal |
both DAGs, same sharing shape |
Equal |
Equal |
|
Equal |
Not equal |
|
Equal |
Not equal |
"var" — Bound Variables#
@py_class(structural_eq="var")
class Var(Object):
name: str = field(structural_eq="ignore") # alpha-equivalent vars differ in name
type: Type # participates in equality
Meaning: “This is a variable. Two variables are equal if they are
bound in corresponding positions, not if they have the same name.”
The name field is almost always marked structural_eq="ignore"
because alpha-equivalent variables have different names. Other fields
such as type are compared — but only at the binding site (see
Fields and the sticky mapping).
The problem#
fun x → x + 1 should equal fun y → y + 1
Variables are not defined by their content, such as their name. They
are defined by where they are introduced and how they are used.
x and y above are interchangeable because they occupy the same
binding position and are used in the same way.
How it works: definition regions#
"var" works together with field(structural_eq="def") (see
Field-Level Annotations). A field marked structural_eq="def" is a
definition region — it’s where new variable bindings are introduced.
Inside a definition region: encountering two different variables establishes a correspondence (“treat
xas equivalent toy”).Outside a definition region: variables are only equal if a prior correspondence already exists, or they are the same pointer.
The following diagram traces the comparison of two alpha-equivalent functions:
sequenceDiagram
participant C as Comparator
participant L as lhs: fun x → x + 1
participant R as rhs: fun y → y + 1
Note over C: Field "params" has structural_eq="def"
C->>L: get params → [x]
C->>R: get params → [y]
Note over C: Enter definition region
C->>C: Compare x ↔ y: both are Vars
Note over C: Record mapping: x ↔ y
Note over C: Exit definition region
Note over C: Field "body" — normal region
C->>L: get body → x + 1
C->>R: get body → y + 1
C->>C: Compare + fields...
C->>C: x ↔ y: lookup finds x→y ✓
C->>C: 1 ↔ 1: equal ✓
Note over C: Result: EQUAL ✓
Without a definition region, the same variables would not be equal:
# Bare expressions, no enclosing function:
x + 1 vs y + 1 → NOT EQUAL (no definition region, different pointers)
Fields and the sticky mapping#
A "var" type still has fields, and non-ignored fields are compared —
but only on the first encounter of a var pair. Once a mapping is
recorded, subsequent occurrences look up the mapping and skip field
comparison entirely.
Take the Var declaration from the top of this section: name is
ignored, but type is not. The first time a pair of vars is seen in
a definition region, their type fields are compared and the mapping
is only established if they match. After that, the mapping is sticky
— later occurrences trust the correspondence regardless of those fields:
Scenario |
Result |
|---|---|
|
Fields match → mapping |
|
Fields differ → Not equal |
|
Lookup succeeds → Equal (types are not rechecked) |
For IRs where type consistency is part of well-formedness, this is
usually sufficient: a well-formed program uses each var with a consistent
type at every occurrence, so the first-encounter check at the binding
site covers the rest. If you truly want types re-verified at every use,
they don’t belong on the "var" node — lift them into the surrounding
expression/statement node where they participate in normal "tree"
comparison.
Full comparison: with and without definition regions#
Scenario |
With |
Without |
|---|---|---|
|
Equal |
n/a |
|
Not equal (body uses |
n/a |
|
Equal (x↔a, y↔b) |
n/a |
|
Not equal (x↔a but body uses |
n/a |
|
n/a |
Not equal |
|
n/a |
Equal |
Inconsistent variable usage#
The bijective mapping catches inconsistencies. Consider:
fun (x, y) → x + x vs fun (a, b) → a + b
sequenceDiagram
participant C as Comparator
participant L as lhs: fun (x, y) → x + x
participant R as rhs: fun (a, b) → a + b
Note over C: Definition region (params)
C->>C: x ↔ a → record x↔a ✓
C->>C: y ↔ b → record y↔b ✓
Note over C: Body: x + x vs a + b
C->>C: x ↔ a → lookup x→a, matches ✓
C->>C: x ↔ b → lookup x→a, but rhs is b ≠ a → FAIL ✗
Note over C: Result: NOT EQUAL ✓
The map_free_vars flag#
structural_equal(lhs, rhs, map_free_vars=True) starts the comparison
in “definition region” mode. This is useful for comparing standalone
expressions where you want alpha-equivalence at the top level without an
enclosing function:
# With map_free_vars=True:
structural_equal(x + 1, y + 1, map_free_vars=True) # → True
# With map_free_vars=False (default):
structural_equal(x + 1, y + 1) # → False
"singleton" — Singletons#
@py_class(structural_eq="singleton")
class Op(Object):
name: str
Meaning: “There is exactly one instance of this object per logical identity. Pointer equality is the only valid comparison.”
No content comparison is ever performed. Different pointers are always unequal; same pointer is always equal.
op_conv = Op.get("nn.conv2d")
op_relu = Op.get("nn.relu")
structural_equal(op_conv, op_conv) # → True (same pointer)
structural_equal(op_conv, op_relu) # → False (different pointers)
Field-Level Annotations#
The structural_eq parameter on field() controls
how structural equality/hashing treats that specific field.
structural_eq="ignore" — Exclude a field#
@py_class(structural_eq="tree")
class MyNode(Object):
value: int
span: str = field(structural_eq="ignore")
Meaning: “This field is not part of the node’s structural identity. Skip it during comparison and hashing.”
Use for:
Source locations (
span) — where the node came from in source code doesn’t affect what it means.Cached/derived values — computed from other fields, would be redundant to compare.
Debug annotations — names, comments, metadata for human consumption.
structural_eq="def" — Definition region#
@py_class(structural_eq="tree")
class Lambda(Object):
params: list[Var] = field(structural_eq="def")
body: Expr
Meaning: “This field introduces new variable bindings. When comparing or hashing this field, allow new variable correspondences to be established.”
This is the counterpart to "var". A "var" type says “I am a
variable”; structural_eq="def" says “this field is where variables are
defined.” Together they enable alpha-equivalence: comparing functions up
to consistent variable renaming.
Use for:
Function parameter lists
Let-binding left-hand sides
Any field that introduces names into scope
Custom Equality and Hashing: __s_equal__ / __s_hash__#
For types where the default field-by-field traversal is insufficient (for
example, fields that need to be visited in a specific order, cross-field
invariants, or sub-values that need a different def_region setting
than the declarative field flags allow), you can register custom
callbacks as type attributes:
__s_equal__— custom structural equality logic.__s_hash__— custom structural hashing logic.
These are the only supported way to override structural comparison.
structural_equal / structural_hash never consult Python
__eq__ / __hash__ — those dunders serve a separate purpose
(== and hash(), which default to pointer identity).
When either hook is registered, it replaces the default field iteration
for that type. All kind-specific machinery ("dag" memoization,
"var" mapping, the pointer shortcut of "const-tree", etc.) is
still managed by the framework — the custom callback only controls
which sub-values are compared or hashed, in what order, and with
what def_region flag.
Signatures#
__s_equal__:
(self, other, eq_cb) -> bool
eq_cb(lhs, rhs, def_region: bool, field_name: str) -> bool
__s_hash__:
(self, init_hash: int, hash_cb) -> int
hash_cb(value, init_hash: int, def_region: bool) -> int
The def_region flag on each recursive call controls whether the
sub-value is compared/hashed in a definition region (enabling new
variable bindings, just like field(structural_eq="def")). The
field_name argument on eq_cb is used only for mismatch path
reporting from get_first_structural_mismatch().
Example (Python)#
@py_class(structural_eq="tree")
class Lambda(Object):
params: list
body: Any
comment: str # not part of identity, but also not iterated below
def __s_equal__(self, other, eq_cb):
# params is a definition region; body is not.
if not eq_cb(self.params, other.params, True, "params"):
return False
if not eq_cb(self.body, other.body, False, "body"):
return False
return True
def __s_hash__(self, init_hash, hash_cb):
h = hash_cb(self.params, init_hash, True)
h = hash_cb(self.body, h, False)
return h
The two methods must agree: if __s_equal__ considers two instances
equal, __s_hash__ must produce the same hash for them.
Example (C++)#
class MyNodeObj : public Object {
public:
Array<Var> params;
Array<ObjectRef> body;
bool SEqual(const MyNodeObj* other,
ffi::TypedFunction<bool(AnyView, AnyView, bool, AnyView)> cmp) const {
if (!cmp(params, other->params, /*def_region=*/true, "params")) return false;
if (!cmp(body, other->body, /*def_region=*/false, "body")) return false;
return true;
}
int64_t SHash(int64_t init_hash,
ffi::TypedFunction<int64_t(AnyView, int64_t, bool)> hash) const {
int64_t h = hash(params, init_hash, /*def_region=*/true);
h = hash(body, h, /*def_region=*/false);
return h;
}
static void RegisterReflection() {
namespace refl = tvm::ffi::reflection;
refl::ObjectDef<MyNodeObj>()
.def_ro("params", &MyNodeObj::params)
.def_ro("body", &MyNodeObj::body);
refl::TypeAttrDef<MyNodeObj>()
.def(refl::type_attr::kSEqual, &MyNodeObj::SEqual)
.def(refl::type_attr::kSHash, &MyNodeObj::SHash);
}
static constexpr TVMFFISEqHashKind _type_s_eq_hash_kind = kTVMFFISEqHashKindTreeNode;
TVM_FFI_DECLARE_OBJECT_INFO_FINAL("my.Node", MyNodeObj, Object);
};
See tvm::ffi::reflection::type_attr::kSEqual and
tvm::ffi::reflection::type_attr::kSHash in
include/tvm/ffi/reflection/accessor.h for the full reference.
All Kinds at a Glance#
The following diagram visualizes the five comparable kinds, arranged by how much structural information they track:
graph LR
UI["singleton<br/><i>pointer only</i>"]
TN["tree<br/><i>content only</i>"]
CTN["const-tree<br/><i>content + pointer shortcut</i>"]
DN["dag<br/><i>content + sharing</i>"]
FV["var<br/><i>content + binding position</i>"]
UI --- TN
TN --- CTN
TN --- DN
TN --- FV
style UI fill:#e2e3e5
style TN fill:#d4edda
style CTN fill:#d4edda
style DN fill:#cce5ff
style FV fill:#fff3cd
Content comparison |
Pointer shortcut |
Tracks sharing |
Tracks binding position |
|
|---|---|---|---|---|
|
No |
Yes (only) |
No |
No |
|
Yes |
No |
No |
No |
|
Yes |
Yes (fast path) |
No |
No |
|
Yes |
No |
Yes |
No |
|
Yes |
No |
No |
Yes |
Decision Guide#
When defining a new type:
graph TD
Start["New @py_class type"] --> Q1{"Singleton?<br/>(one instance per<br/>logical identity)"}
Q1 -->|Yes| UI["structural_eq="singleton""]
Q1 -->|No| Q2{"Represents a<br/>variable binding?"}
Q2 -->|Yes| FV["structural_eq="var""]
Q2 -->|No| Q3{"Pointer sharing<br/>semantically<br/>meaningful?"}
Q3 -->|Yes| DN["structural_eq="dag""]
Q3 -->|No| Q4{"Immutable AND<br/>no transitive<br/>var children?"}
Q4 -->|Yes| CTN["structural_eq="const-tree""]
Q4 -->|No| TN["structural_eq="tree""]
style UI fill:#e2e3e5
style FV fill:#fff3cd
style DN fill:#cce5ff
style CTN fill:#d4edda
style TN fill:#d4edda
For fields:
graph TD
Start["field() parameter"] --> Q1{"Irrelevant to<br/>structural identity?<br/>(span, cache, debug)"}
Q1 -->|Yes| IGN["structural_eq="ignore""]
Q1 -->|No| Q2{"Introduces new<br/>variable bindings?"}
Q2 -->|Yes| DEF["structural_eq="def""]
Q2 -->|No| NONE["No flag needed"]
style IGN fill:#f8d7da
style DEF fill:#fff3cd
style NONE fill:#d4edda
Worked Example#
Putting it all together for a function node with parameters, body, and source location:
@py_class(structural_eq="tree")
class Lambda(Object):
params: list[Var] = field(structural_eq="def")
body: Expr
span: str = field(structural_eq="ignore", default="")
@py_class(structural_eq="var")
class Var(Object):
name: str = field(structural_eq="ignore")
@py_class(structural_eq="singleton")
class Op(Object):
name: str
With these annotations, alpha-equivalent functions are structurally equal:
# These two are structurally equal:
fun [x] → x + 1 (span="a.py:1")
fun [y] → y + 1 (span="b.py:5")
# - params has structural_eq="def" → x maps to y
# - body uses that mapping → (x + 1) ≅ (y + 1)
# - span has structural_eq="ignore" → locations don't matter
And in Python:
from tvm_ffi import structural_equal, structural_hash
x, y = Var("x"), Var("y")
f1 = Lambda([x], x + 1, span="a.py:1")
f2 = Lambda([y], y + 1, span="b.py:5")
assert structural_equal(f1, f2) # alpha-equivalent
assert structural_hash(f1) == structural_hash(f2) # same hash