tvm.tir

Namespace for Tensor-level IR

Classes:

Buffer

Symbolic data buffer in TVM.

DataProducer

Layout

Layout is composed of upper cases, lower cases and numbers, where upper case indicates a primal axis and the corresponding lower case with factor size indicates the subordinate axis.

BijectiveLayout

Bijective mapping for two layouts (src-layout and dst-layout).

Var(name, dtype, tvm.ir.type.Type], span)

Symbolic variable.

SizeVar(name, dtype[, span])

Symbolic variable to represent a tensor index size

Reduce(combiner, src, rdom, condition, …)

Reduce node.

FloatImm(dtype, value[, span])

Float constant.

IntImm(dtype, value[, span])

Int constant.

StringImm(value[, span])

String constant.

Cast(dtype, value[, span])

Cast expression.

Add(a, b[, span])

Add node.

Sub(a, b[, span])

Sub node.

Mul(a, b[, span])

Mul node.

Div(a, b[, span])

Div node.

Mod(a, b[, span])

Mod node.

FloorDiv(a, b[, span])

FloorDiv node.

FloorMod(a, b[, span])

FloorMod node.

Min(a, b[, span])

Min node.

Max(a, b[, span])

Max node.

EQ(a, b[, span])

EQ node.

NE(a, b[, span])

NE node.

LT(a, b[, span])

LT node.

LE(a, b[, span])

LE node.

GT(a, b[, span])

GT node.

GE(a, b[, span])

GE node.

And(a, b[, span])

And node.

Or(a, b[, span])

Or node.

Not(a[, span])

Not node.

Select(condition, true_value, false_value[, …])

Select node.

BufferLoad(buffer, indices[, span])

Buffer load node.

ProducerLoad(producer, indices[, span])

Producer load node.

Load(dtype, buffer_var, index[, predicate, span])

Load node.

Ramp(base, stride, lanes[, span])

Ramp node.

Broadcast(value, lanes[, span])

Broadcast node.

Shuffle(vectors, indices[, span])

Shuffle node.

Call(dtype, op, args[, span])

Call node.

CallEffectKind()

Possible kinds of Call effects.

Let(var, value, body[, span])

Let node.

IterVar(dom, var, iter_type[, thread_tag, span])

Represent iteration variable.

Any([span])

Any node.

Stmt

Base class of all the statements.

LetStmt(var, value, body[, span])

LetStmt node.

AssertStmt(condition, message, body[, span])

AssertStmt node.

ForKind(value)

The kind of the for loop.

For(loop_var, min_val, extent, kind, body[, …])

For node.

While(condition, body[, span])

While node.

BufferStore(buffer, value, indices[, span])

Buffer store node.

BufferRealize(buffer, bounds, condition, body)

Buffer realize node.

Store(buffer_var, value, index[, predicate, …])

Store node.

ProducerStore(producer, value, indices[, span])

ProducerStore node.

Allocate(buffer_var, dtype, extents, …[, …])

Allocate node.

AttrStmt(node, attr_key, value, body[, span])

AttrStmt node.

ProducerRealize(producer, bounds, condition, …)

ProducerRealize node.

SeqStmt(seq[, span])

Sequence of statements.

IfThenElse(condition, then_case, else_case)

IfThenElse node.

Evaluate(value[, span])

Evaluate node.

Prefetch(buffer, bounds[, span])

Prefetch node.

BufferRegion(buffer, region)

BufferRegion node.

MatchBufferRegion(buffer, source)

MatchBufferRegion node.

Block(iter_vars, reads, writes, name_hint, …)

Block node.

BlockRealize(iter_values, predicate, bool], …)

BlockRealize node.

PrimFunc(params, body[, ret_type, …])

A function declaration expression.

StmtSRef

An object that refers to schedulable elements in the TensorIR, aka “sref”.

BlockScope

An object corresponds to each block sref in the sref tree, which tracks the producer-consumer dependency between blocks.

ScheduleState(mod, tvm.ir.module.IRModule], …)

The state of scheduling, which exposes a Replace method as the primary resort for all the scheduling primitives to manipulate the TensorIR.

Schedule(mod, tvm.ir.module.IRModule], *, …)

The user-facing schedule class

Functions:

decl_buffer(shape[, dtype, name, data, …])

Declare a new symbolic buffer.

bijective_layout(src_layout, dst_layout)

Create a bijective layout mapping.

layout(layout_str)

Create a layout node from a string.

stmt_seq(*args)

Make sequence of statements

stmt_list(stmt)

Make list of stmt from blocks.

call_packed(*args[, span])

Build expression by call an external packed function.

call_intrin(dtype, func_name, *args[, span])

Build expression by calling an intrinsic function.

call_pure_extern(dtype, func_name, *args[, span])

Build expression by calling a pure extern function.

call_extern(dtype, func_name, *args[, span])

Build expression by calling a extern function.

call_llvm_intrin(dtype, name, *args[, span])

Build expression by calling a llvm intrinsic function

call_llvm_pure_intrin(dtype, name, *args[, span])

Build expression by calling a pure llvm intrinsic function

ret(val)

Create a tir return expression

all(*args[, span])

Create a new expression of the intersection of all conditions in the

any(*args[, span])

Create a new experssion of the union of all conditions in the arguments

min_value(dtype[, span])

minimum value of dtype

max_value(dtype[, span])

maximum value of dtype

trace(args[, trace_action])

Trace tensor data at the runtime.

exp(x)

Take exponential of input x.

exp2(x)

Calculate 2**x

exp10(x)

Calculate 10**x

log(x)

Take log of input x.

log2(x)

Take log2 of input x.

log10(x)

Take log10 of input x.

log1p(x)

Take log(x + 1) with respect to input x.

ldexp(x1, x2)

Returns x1 * (2 ** x2).

clz(x)

Count leading zero bits of an integer x.

sin(x)

Take sin of input x.

sinh(x)

Take sinh of input x.

asin(x)

Take asin of input x.

asinh(x)

Take asinh of input x.

cos(x)

Take cos of input x.

cosh(x)

Take cosh of input x.

acos(x)

Take acos of input x.

acosh(x)

Take acos of input x.

tan(x)

Take tan of input x.

tanh(x)

Take hyperbolic tanh of input x.

atan(x)

Take atan of input x.

atan2(x1, x2)

Take arctan2(x1, x2).

atanh(x)

Take atanh of input x.

erf(x)

Take gauss error function of the input x.

sigmoid(x)

Quick function to get sigmoid

sqrt(x)

Take square root of input x.

rsqrt(x)

Take reciprocal of square root of input x.

floor(x[, span])

Take floor of float input x.

ceil(x[, span])

Take ceil of float input x.

hypot(x1, x2)

Equivalent to sqrt(x1**2 + x2**2), element-wise.

trunc(x[, span])

Get truncated value of the input.

abs(x[, span])

Get absolute value of the input element-wise.

round(x[, span])

Round elements of the array to the nearest integer.

nextafter(x1, x2)

Return the next floating-point value after x1 towards x2.

nearbyint(x[, span])

Round elements of the array to the nearest integer.

power(x, y[, span])

x power y

popcount(x)

Count the number of set bits in input x.

fmod(x, y)

Return the remainder of x divided by y with the same sign as x.

if_then_else(cond, t, f[, span])

Conditional selection expression.

isnan(x[, span])

Check if input value is Nan.

isfinite(x[, span])

Check if input value is finite.

isinf(x[, span])

Check if input value is infinite.

copysign(x1, x2)

Change the sign of x1 to that of x2, element-wise.

div(a, b[, span])

Compute a / b as in C/C++ semantics.

indexdiv(a, b[, span])

Compute floor(a / b) where a and b are non-negative.

indexmod(a, b[, span])

Compute the remainder of indexdiv.

truncdiv(a, b[, span])

Compute the truncdiv of two expressions.

truncmod(a, b[, span])

Compute the truncmod of two expressions.

floordiv(a, b[, span])

Compute the floordiv of two expressions.

floormod(a, b[, span])

Compute the floormod of two expressions.

comm_reducer(fcombine, fidentity[, name])

Create a commutative reducer for reduction.

min(expr, axis[, where, init])

Create a min expression over axis.

max(expr, axis[, where, init])

Create a max expression over axis.

sum(expr, axis[, where, init])

Create a sum expression over axis.

q_multiply_shift(x, y, q, s)

Execute a multiplication between two Q-numbers x and y followed by a right shift s.

Exceptions:

ScheduleError

Error that happens during TensorIR scheduling.

class tvm.tir.Buffer

Symbolic data buffer in TVM.

Buffer provide a way to represent data layout specialization of data structure in TVM.

Do not construct directly, use decl_buffer() instead. See the documentation of decl_buffer() for more details.

See also

decl_buffer

Declare a buffer

Methods:

access_ptr(access_mask[, ptr_type, …])

Get an access pointer to the head of buffer.

vload(begin[, dtype])

Generate an Expr that loads dtype from begin index.

vstore(begin, value)

Generate a Stmt that store value into begin index.

scope()

Return the storage scope associated with this buffer.

access_ptr(access_mask, ptr_type='handle', content_lanes=1, offset=0)

Get an access pointer to the head of buffer.

This is the recommended method to get buffer data ptress when interacting with external functions.

Parameters
  • access_mask (int) – The access pattern MASK. Indicate whether the access will read or write to the data content.

  • ptr_type (str, optional) – The data type of the result pointer. Do not specify unless we want to cast pointer to specific type.

  • content_lanes (int, optional) – The number of lanes for the data type. This value is greater than one for vector types.

  • offset (Expr, optional) – The offset of pointer. We can use it to offset by the number of elements from the address of ptr.

Examples

# Get access ptr for read
buffer.access_ptr("r")
# Get access ptr for read/write with bitmask
buffer.access_ptr(Buffer.READ | Buffer.WRITE)
# Get access ptr for read/write with str flag
buffer.access_ptr("rw")
# Get access ptr for read with offset
buffer.access_ptr("r", offset = 100)
vload(begin, dtype=None)

Generate an Expr that loads dtype from begin index.

Parameters
  • begin (Array of Expr) – The beginning index in unit of Buffer.dtype

  • dtype (str) – The data type to be loaded, can be vector type which have lanes that is multiple of Buffer.dtype

Returns

load – The corresponding load expression.

Return type

Expr

vstore(begin, value)

Generate a Stmt that store value into begin index.

Parameters
  • begin (Array of Expr) – The beginning index in unit of Buffer.dtype

  • value (Expr) – The value to be stored.

Returns

store – The corresponding store stmt.

Return type

Stmt

scope()

Return the storage scope associated with this buffer. :returns: scope – The storage scope associated with this buffer. :rtype: str

tvm.tir.decl_buffer(shape, dtype=None, name='buffer', data=None, strides=None, elem_offset=None, scope='', data_alignment=- 1, offset_factor=0, buffer_type='', span=None)

Declare a new symbolic buffer.

Normally buffer is created automatically during lower and build. This is only needed if user want to specify their own buffer layout.

See the note below for detailed discussion on usage of buffer.

Parameters
  • shape (tuple of Expr) – The shape of the buffer.

  • dtype (str, optional) – The data type of the buffer.

  • name (str, optional) – The name of the buffer.

  • data (Var, optional) – The data pointer in the buffer.

  • strides (array of Expr) – The stride of the buffer.

  • elem_offset (Expr, optional) – The beginning offset of the array to data. In terms of number of elements of dtype.

  • scope (str, optional) – The storage scope of the buffer, if not global. If scope equals empty string, it means it is global memory.

  • data_alignment (int, optional) – The alignment of data pointer in bytes. If -1 is passed, the alignment will be set to TVM’s internal default.

  • offset_factor (int, optional) – The factor of elem_offset field, when set, elem_offset is required to be multiple of offset_factor. If 0 is pssed, the alignment will be set to 1. if non-zero is passed, we will created a Var for elem_offset if elem_offset is not None.

  • buffer_type (str, optional, {"", "auto_broadcast"}) – auto_broadcast buffer allows one to implement broadcast computation without considering whether dimension size equals to one. TVM maps buffer[i][j][k] -> buffer[i][0][k] if dimension j’s shape equals 1.

  • span (Optional[Span]) – The location of the decl_buffer creation in the source.

Returns

buffer – The created buffer

Return type

tvm.tir.Buffer

Example

Here’s an example of how broadcast buffer can be used to define a symbolic broadcast operation,

m0, m1, m2 = te.var("m0"), te.var("m1"), te.var("m2")
n0, n1, n2 = te.var("n0"), te.var("n1"), te.var("n2")
o0, o1, o2 = te.var("o0"), te.var("o1"), te.var("o2")
A = te.placeholder((m0, m1, m2), name='A')
B = te.placeholder((n0, n1, n2), name='B')
C = te.compute((o0, o1, o2), lambda i, j, k: A[i, j, k] + B[i, j, k], name='C')
Ab = tvm.tir.decl_buffer(A.shape, A.dtype, name="Ab", buffer_type="auto_broadcast")
Bb = tvm.tir.decl_buffer(B.shape, B.dtype, name="Bb", buffer_type="auto_broadcast")
s = te.create_schedule(C.op)
fadd = tvm.build(s, [A, B, C], target='llvm', name='bcast_add', binds={A:Ab, B:Bb})
dev = tvm.cpu(0)
a = tvm.nd.array(np.random.uniform(size=(2, 4, 3)).astype(A.dtype), dev)
b = tvm.nd.array(np.random.uniform(size=(2, 1, 3)).astype(B.dtype), dev)
c = tvm.nd.array(np.zeros((2, 4, 3), dtype=C.dtype), dev)
fadd(a, b, c)
tvm.testing.assert_allclose(c.numpy(), a.numpy() + b.numpy())

Note

Buffer data structure reflects the DLTensor structure in dlpack. While DLTensor data structure is very general, it is usually helpful to create function that only handles specific case of data structure and make compiled function benefit from it.

If user pass strides and elem_offset is passed as None when constructing the function, then the function will be specialized for the DLTensor that is compact and aligned. If user pass a fully generic symbolic array to the strides, then the resulting function becomes fully generic.

class tvm.tir.DataProducer
class tvm.tir.Layout

Layout is composed of upper cases, lower cases and numbers, where upper case indicates a primal axis and the corresponding lower case with factor size indicates the subordinate axis. For example, NCHW16c can describe a 5-D tensor of [batch_size, channel, height, width, channel_block]. Here subordinate axis channel_block=16 is the factor size of the primal axis C (channel).

See also

layout

Declare a layout

Methods:

index_of(axis)

Get the index of an axis

factor_of(axis)

Get the factor size of the subordinate axis.

index_of(axis)

Get the index of an axis

Parameters

axis (str) – The axis name, need to be [a-z,A-Z]

Returns

index – The index of the axis, -1 if not found.

Return type

int

factor_of(axis)

Get the factor size of the subordinate axis.

Parameters

axis (str) – The axis name, need to be [a-z,A-Z]

Returns

factor – the size of the subordinate-axis of axis (if axis is a primal-axis), or the size of axis itself (if axis is a subordinate-axis). Return -1 if axis is not in the layout.

Return type

int

class tvm.tir.BijectiveLayout

Bijective mapping for two layouts (src-layout and dst-layout). It provides shape and index conversion between each other.

Do not construct directly, use bijective_layout instead. See the documentation of bijective_layout for more details.

Parameters
  • src_layout (str or Layout) – source layout.

  • dst_layout (str or Layout) – destination layout.

See also

bijective_layout

Declare a layout

Methods:

forward_index(index)

Given the indices of the src-layout, infer the dst index.

backward_index(index)

Given the indices of the dst-layout, infer the src index.

forward_shape(shape)

Given the shape of the src-layout, infer the dst shape.

backward_shape(shape)

Given the shape of the dst-layout, infer the src shape.

forward_index(index)

Given the indices of the src-layout, infer the dst index.

Parameters

index (Array of Expr) – The indices in src-layout.

Returns

dst_index – The inferred indices in dst-layout.

Return type

Array of Expr

backward_index(index)

Given the indices of the dst-layout, infer the src index.

Parameters

index (Array of Expr) – The indices in dst-layout.

Returns

src_index – The inferred indices in src-layout.

Return type

Array of Expr

forward_shape(shape)

Given the shape of the src-layout, infer the dst shape.

Parameters

shape (Array of Expr) – The shape in src-layout.

Returns

dst_shape – The inferred shape in dst-layout.

Return type

Array of Expr

backward_shape(shape)

Given the shape of the dst-layout, infer the src shape.

Parameters

shape (Array of Expr) – The shape in dst-layout.

Returns

src_shape – The inferred shape in src-layout.

Return type

Array of Expr

tvm.tir.bijective_layout(src_layout: Union[str, tvm.tir.data_layout.Layout], dst_layout: Union[str, tvm.tir.data_layout.Layout]) tvm.tir.data_layout.BijectiveLayout

Create a bijective layout mapping.

Parameters
  • src_layout (str or Layout) – source layout.

  • dst_layout (str or Layout) – destination layout.

Returns

bijective_layout – The created bijective layout

Return type

BijectiveLayout

tvm.tir.layout(layout_str: str) tvm.tir.data_layout.Layout

Create a layout node from a string.

Parameters

layout_str (str) – A layout representation is composed of upper cases, lower cases and numbers, where upper case indicates a primal axis and the corresponding lower case with factor size indicates the subordinate axis. For example, NCHW16c can describe a 5-D tensor of [batch_size, channel, height, width, channel_block]. Here subordinate axis channel_block=16 is the factor size of the primal axis C (channel).

Returns

layout – The created layout

Return type

Layout

class tvm.tir.Var(name: str, dtype: Union[str, tvm.ir.type.Type], span: Optional[tvm.ir.base.Span] = None)

Symbolic variable.

Parameters
  • name (str) – The name

  • dtype (Union[str, tvm.irType]) – The data type

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.SizeVar(name, dtype, span=None)
Symbolic variable to represent a tensor index size

which is greater or equal to zero.

Parameters
  • name (str) – The name

  • dtype (int) – The data type

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Reduce(combiner, src, rdom, condition, value_index, init=None, span=None)

Reduce node.

Parameters
  • combiner (CommReducer) – The combiner.

  • src (list of Expr) – The source expression.

  • rdom (list of IterVar) – The iteration domain

  • condition (PrimExpr) – The reduce condition.

  • value_index (int) – The value index.

  • init (list of Expr) – The initial value for output. This can be an int, float or ProducerLoad

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.FloatImm(dtype, value, span=None)

Float constant.

Parameters
  • dtype (str) – The data type

  • value (float) – The constant value.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.IntImm(dtype, value, span=None)

Int constant.

Parameters
  • dtype (str) – The data type

  • value (int) – The constant value.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.StringImm(value, span=None)

String constant.

Parameters
  • value (str) – The value of the function.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Cast(dtype, value, span=None)

Cast expression.

Parameters
  • dtype (str) – The data type

  • value (PrimExpr) – The value of the function.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Add(a, b, span=None)

Add node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Sub(a, b, span=None)

Sub node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Mul(a, b, span=None)

Mul node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Div(a, b, span=None)

Div node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Mod(a, b, span=None)

Mod node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.FloorDiv(a, b, span=None)

FloorDiv node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.FloorMod(a, b, span=None)

FloorMod node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Min(a, b, span=None)

Min node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Max(a, b, span=None)

Max node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.EQ(a, b, span=None)

EQ node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.NE(a, b, span=None)

NE node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.LT(a, b, span=None)

LT node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.LE(a, b, span=None)

LE node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.GT(a, b, span=None)

GT node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.GE(a, b, span=None)

GE node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.And(a, b, span=None)

And node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Or(a, b, span=None)

Or node.

Parameters
  • a (PrimExpr) – The left hand operand.

  • b (PrimExpr) – The right hand operand.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Not(a, span=None)

Not node.

Parameters
  • a (PrimExpr) – The input value

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Select(condition, true_value, false_value, span=None)

Select node.

Note

Select may compute both true_value and false_value. Use tvm.tir.if_then_else instead if you want to get a conditional expression that only evaluates the correct branch.

Parameters
  • condition (PrimExpr) – The condition expression.

  • true_value (PrimExpr) – The value to take when condition is true.

  • false_value (PrimExpr) – The value to take when condition is false.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.BufferLoad(buffer, indices, span=None)

Buffer load node.

Parameters
  • buffer (Buffer) – The buffer to be loaded.

  • indices (List[PrimExpr]) – The buffer indices.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.ProducerLoad(producer, indices, span=None)

Producer load node.

Parameters
  • producer (DataProducer) – The buffer to be loaded.

  • indices (List[PrimExpr]) – The buffer indices.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Load(dtype, buffer_var, index, predicate=None, span=None)

Load node.

Parameters
  • dtype (str) – The data type.

  • buffer_var (Var) – The buffer variable in the load expression.

  • index (PrimExpr) – The index in the load.

  • predicate (PrimExpr) – The load predicate.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Ramp(base, stride, lanes, span=None)

Ramp node.

Parameters
  • base (PrimExpr) – The base expression.

  • stride (ramp stride) – The stride of the ramp.

  • lanes (int) – The lanes of the expression.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Broadcast(value, lanes, span=None)

Broadcast node.

Parameters
  • value (PrimExpr) – The value of the expression.

  • lanes (int) – The lanes of the expression.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Shuffle(vectors, indices, span=None)

Shuffle node.

Parameters
  • vectors (Array of Expr) – The vectors

  • indices (Array of indices) – The indices

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Call(dtype, op, args, span=None)

Call node.

Parameters
  • dtype (str) – The return data type

  • op (Union[RelayExpr, str]) – The function to be called, or the name to the global tvm.Op

  • args (list of Expr) – The input arguments to the call

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.CallEffectKind

Possible kinds of Call effects.

class tvm.tir.Let(var, value, body, span=None)

Let node.

Parameters
  • var (Var) – The variable in the binding.

  • value (PrimExpr) – The value in to be binded.

  • body (PrimExpr) – The body expression.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.IterVar(dom, var, iter_type, thread_tag='', span=None)

Represent iteration variable.

IterVar represents axis iterations in the computation.

Parameters
  • dom (Range) – The domain of the iteration.

  • var (Union[Var, str]) – The internal variable that is used for iteration.

  • iter_type (int) – The iteration type.

  • thread_tag (str) – The thread type tag.

  • span (Optional[Span]) – The location of this itervar in the source code.

See also

te.thread_axis

Create thread axis IterVar.

te.reduce_axis

Create reduce axis IterVar.

class tvm.tir.Any(span=None)

Any node.

spanOptional[Span]

The location of this itervar in the source code.

class tvm.tir.Stmt

Base class of all the statements.

class tvm.tir.LetStmt(var, value, body, span=None)

LetStmt node.

Parameters
  • var (Var) – The variable in the binding.

  • value (PrimExpr) – The value in to be binded.

  • body (Stmt) – The body statement.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.AssertStmt(condition, message, body, span=None)

AssertStmt node.

Parameters
  • condition (PrimExpr) – The assert condition.

  • message (PrimExpr) – The error message.

  • body (tvm.tir.Stmt) – The body statement.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.ForKind(value)

The kind of the for loop.

Note

ForKind can change the control flow semantics of the loop and need to be considered in all TIR passes.

class tvm.tir.For(loop_var, min_val, extent, kind, body, thread_binding=None, annotations=None, span=None)

For node.

Parameters
  • loop_var (Var) – The loop variable.

  • min_val (PrimExpr) – The beginning value.

  • extent (PrimExpr) – The length of the loop.

  • kind (ForKind) – The type of the for.

  • body (Stmt) – The body statement.

  • thread_binding (Optional[tir.IterVar]) – The thread this loop binds to. Only valid if kind is ThreadBinding

  • annotations (tvm.ir.Map) – Additional annotation hints.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.While(condition, body, span=None)

While node.

Parameters
  • condition (PrimExpr) – The termination condition.

  • body (Stmt) – The body statement.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.BufferStore(buffer, value, indices, span=None)

Buffer store node.

Parameters
  • buffer (Buffer) – The buffer.

  • value (PrimExpr) – The value we to be stored.

  • indices (List[PrimExpr]) – The indices location to be stored.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.BufferRealize(buffer, bounds, condition, body, span=None)

Buffer realize node.

Parameters
  • buffer (Buffer) – The buffer.

  • bounds (List[Range]) – The value we to be stored.

  • condition (PrimExpr) – The realize condition.

  • body (Stmt) – The body of the statement.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Store(buffer_var, value, index, predicate=None, span=None)

Store node.

Parameters
  • buffer_var (Var) – The buffer Variable.

  • value (PrimExpr) – The value we want to store.

  • index (PrimExpr) – The index in the store expression.

  • predicate (PrimExpr) – The store predicate.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.ProducerStore(producer, value, indices, span=None)

ProducerStore node.

Parameters
  • producer (DataProducer) – The data producer.

  • value (PrimExpr) – The value to be stored.

  • indices (list of Expr) – The index arguments of the store.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Allocate(buffer_var, dtype, extents, condition, body, annotations=None, span=None)

Allocate node.

Parameters
  • buffer_var (Var) – The buffer variable.

  • dtype (str) – The data type of the buffer.

  • extents (list of Expr) – The extents of the allocate

  • condition (PrimExpr) – The condition.

  • body (Stmt) – The body statement.

  • annotations (Optional[Mapping[str, Object]]) – Additional annotation hints

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.AttrStmt(node, attr_key, value, body, span=None)

AttrStmt node.

Parameters
  • node (Node) – The node to annotate the attribute

  • attr_key (str) – Attribute type key.

  • value (PrimExpr) – The value of the attribute

  • body (Stmt) – The body statement.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.ProducerRealize(producer, bounds, condition, body, storage_scope='', span=None)

ProducerRealize node.

Parameters
  • producer (DataProducer) – The data producer.

  • bounds (list of range) – The bound of realize

  • condition (PrimExpr) – The realize condition.

  • body (Stmt) – The realize body

  • storage_scope (str) – The storage scope associated with this realization

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.SeqStmt(seq, span=None)

Sequence of statements.

Parameters
  • seq (List[Stmt]) – The statements

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.IfThenElse(condition, then_case, else_case, span=None)

IfThenElse node.

Parameters
  • condition (PrimExpr) – The expression

  • then_case (Stmt) – The statement to execute if condition is true.

  • else_case (Stmt) – The statement to execute if condition is false.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Evaluate(value, span=None)

Evaluate node.

Parameters
  • value (PrimExpr) – The expression to be evalued.

  • span (Optional[Span]) – The location of this itervar in the source code.

class tvm.tir.Prefetch(buffer, bounds, span=None)

Prefetch node.

Parameters
  • buffer (Buffer) – The buffer to be prefetched.

  • bounds (list of Range) – The bounds to be prefetched.

  • span (Optional[Span]) – The location of this itervar in the source code.

tvm.tir.stmt_seq(*args)

Make sequence of statements

Parameters

args (list of Expr or Var) – List of statements to be combined as sequence.

Returns

stmt – The combined statement.

Return type

Stmt

tvm.tir.stmt_list(stmt)

Make list of stmt from blocks.

Parameters

stmt (A block statement) –

Returns

stmt_list – The unpacked list of statements

Return type

list of Stmt

class tvm.tir.BufferRegion(buffer: tvm.tir.buffer.Buffer, region: List[tvm.ir.expr.Range])

BufferRegion node.

Parameters
  • buffer (Buffer) – The buffer of the buffer region

  • region (List[Range]) – The region array of the buffer region

class tvm.tir.MatchBufferRegion(buffer: tvm.tir.buffer.Buffer, source: tvm.tir.stmt.BufferRegion)

MatchBufferRegion node.

Parameters
  • buffer (Buffer) – The target buffer

  • source (BufferRegion) – The region of source buffer

class tvm.tir.Block(iter_vars: List[tvm.tir.expr.IterVar], reads: List[tvm.tir.stmt.BufferRegion], writes: List[tvm.tir.stmt.BufferRegion], name_hint: str, body: tvm.tir.stmt.Stmt, init: Optional[tvm.tir.stmt.Stmt] = None, alloc_buffers: Optional[List[tvm.tir.buffer.Buffer]] = None, match_buffers: Optional[List[tvm.tir.stmt.MatchBufferRegion]] = None, annotations: Optional[Mapping[str, tvm.runtime.object.Object]] = None, span: Optional[tvm.ir.base.Span] = None)

Block node.

Parameters
  • iter_vars (List[IterVar]) – The block Variable.

  • reads (List[BufferRegion]) – The read buffer regions of the block.

  • writes (List[BufferRegion]) – The write buffer regions of the block.

  • name_hint (str) – the name_hint of the block.

  • body (Stmt) – The body of the block.

  • init (Optional[Stmt]) – The init block of the reduction block

  • alloc_buffers (Optional[list[Buffer]]) – The buffer allocations

  • match_buffers (Optional[List[MatchBufferRegion]]) – The subregion buffer match

  • annotations (Optional[Mapping[str, Object]]) – Additional annotation hints.

  • span (Optional[Span]) – The location of this block in the source code.

class tvm.tir.BlockRealize(iter_values: List[tvm.ir.expr.PrimExpr], predicate: Union[tvm.ir.expr.PrimExpr, bool], block: tvm.tir.stmt.Block, span: Optional[tvm.ir.base.Span] = None)

BlockRealize node.

Parameters
  • iter_values (List[PrimExpr]) – The binding values of the block var.

  • predicate (Union[PrimExpr, bool]) – The predicate of the block.

  • block (Block) – The block to realize

  • span (Optional[Span]) – The location of this block_realize in the source code.

class tvm.tir.PrimFunc(params, body, ret_type=None, buffer_map=None, attrs=None, span=None)

A function declaration expression.

Parameters
  • params (List[Union[tvm.tir.Var, tvm.tir.Buffer]]) – List of input parameters to the function.

  • body (tvm.tir.Stmt) – The body of the function.

  • ret_type (tvm.ir.Type) – The return type annotation of the function.

  • buffer_map (Map[tvm.tir.Var, tvm.tir.Buffer]) – The buffer binding map.

  • attrs (Optional[tvm.Attrs]) – Attributes of the function, can be None

  • span (Optional[Span]) – The location of this itervar in the source code.

Methods:

with_body(new_body[, span])

Create a new PrimFunc with the same set signatures but a new body.

specialize(param_map)

Specialize parameters of PrimFunc

script([tir_prefix, show_meta])

Print IRModule into TVMScript

with_body(new_body, span=None)

Create a new PrimFunc with the same set signatures but a new body.

Parameters
  • new_body (Stmt) – The new body.

  • span (Optional[Span]) – The location of this itervar in the source code.

Returns

new_func – The created new function.

Return type

PrimFunc

specialize(param_map: Mapping[tvm.tir.expr.Var, Union[tvm.ir.expr.PrimExpr, tvm.tir.buffer.Buffer]])

Specialize parameters of PrimFunc

Parameters

param_map (Mapping[Var, Union[PrimExpr, Buffer]]) – The mapping from function params to the instance

Examples

We can define a Meta TIR function with symbolic shape:

@T.prim_func
def mem_copy(a: T.handle, b: T.handle, m: T.int32, n: T.int32) -> None:
    A = T.match_buffer(a, (m, n), "float32")
    B = T.match_buffer(b, (m, n), "float32")

    for i, j in T.grid(m, n):
        with T.block():
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj]

Then we can make it specialized with given shapes or buffers.

a, _, m, n = mem_copy.params
func = mem_copy.specialize({a: tir.decl_buffer((16, 16))})
# or
func = mem_copy.specialize({n: 16, m: 16})

The specialized function:

@T.prim_func
def mem_copy_16_16(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (16, 16), "float32")
    B = T.match_buffer(b, (16, 16), "float32")

    for i, j in T.grid(16, 16):
        with T.block():
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj]
Returns

func – The new function with parameter specialized

Return type

PrimFunc

script(tir_prefix: str = 'tir', show_meta: bool = False) str

Print IRModule into TVMScript

Parameters
  • tir_prefix (str) – The tir namespace prefix

  • show_meta (bool) – Whether to show meta information

Returns

script – The TVM Script of the PrimFunc

Return type

str

tvm.tir.call_packed(*args, span=None)

Build expression by call an external packed function.

The argument to packed function can be Expr or Buffer. The argument is the corresponding POD type when Expr is presented.

When the argument is Buffer, the corresponding PackedFunc will recieve an TVMArrayHandle whose content is valid during the callback period. If the PackedFunc is a python callback, then the corresponding argument is NDArray.

Parameters
  • args (list of Expr or Buffer.) – Positional arguments.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

call – The call expression.

Return type

PrimExpr

See also

te.extern

Create tensor with extern function call.

tvm.tir.call_intrin(dtype, func_name, *args, span=None)

Build expression by calling an intrinsic function.

Intrinsics can be overloaded with multiple data types via the intrinsic translation rule.

Parameters
  • dtype (str) – The data type of the result.

  • func_name (str) – The intrinsic function name.

  • args (list) – Positional arguments.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

call – The call expression.

Return type

PrimExpr

tvm.tir.call_pure_extern(dtype, func_name, *args, span=None)

Build expression by calling a pure extern function.

Parameters
  • dtype (str) – The data type of the result.

  • func_name (str) – The extern function name.

  • args (list) – Positional arguments.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

call – The call expression.

Return type

PrimExpr

tvm.tir.call_extern(dtype, func_name, *args, span=None)

Build expression by calling a extern function.

Parameters
  • dtype (str) – The data type of the result.

  • func_name (str) – The extern function name.

  • args (list) – Positional arguments.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

call – The call expression.

Return type

PrimExpr

tvm.tir.call_llvm_intrin(dtype, name, *args, span=None)

Build expression by calling a llvm intrinsic function

Parameters
  • dtype (str) – The data type of the result.

  • name (str) – The name of the llvm intrinsic function.

  • args (list) – Poistional arguments.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

call – The call expression.

Return type

PrimExpr

tvm.tir.call_llvm_pure_intrin(dtype, name, *args, span=None)

Build expression by calling a pure llvm intrinsic function

Parameters
  • dtype (str) – The data type of the result.

  • name (str) – The name of the llvm intrinsic function.

  • args (list) – Poistional arguments.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

call – The call expression.

Return type

PrimExpr

tvm.tir.ret(val)

Create a tir return expression

Parameters

val (Expr) – The returned tir expression, whose data type is int, float or void pointer.

Returns

ret – The return expression

Return type

PrimExpr

tvm.tir.all(*args, span=None)
Create a new expression of the intersection of all conditions in the

arguments

Parameters
  • args (list) – List of symbolic boolean expressions

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

expr – Expression

Return type

Expr

tvm.tir.any(*args, span=None)

Create a new experssion of the union of all conditions in the arguments

Parameters
  • args (list) – List of symbolic boolean expressions

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

expr – Expression

Return type

Expr

tvm.tir.min_value(dtype, span=None)

minimum value of dtype

Parameters
  • dtype (str) – The data type.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

value – The minimum value of dtype.

Return type

tvm.Expr

tvm.tir.max_value(dtype: str, span: Optional[tvm.ir.base.Span] = None) Any

maximum value of dtype

Parameters
  • dtype (str) – The data type.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

value – The maximum value of dtype.

Return type

tvm.Expr

tvm.tir.trace(args, trace_action='tvm.default_trace_action')

Trace tensor data at the runtime.

The trace function allows to trace specific tensor at the runtime. The tracing value should come as last argument. The trace action should be specified, by default tvm.default_trace_action is used.

Parameters
  • args (list of Expr or Buffers.) – Positional arguments.

  • trace_action (str.) – The name of the trace action.

Returns

call – The call expression.

Return type

PrimExpr

See also

tvm.tir.call_packed

Creates packed function.

tvm.tir.exp(x)

Take exponential of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.exp2(x)

Calculate 2**x

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.exp10(x)

Calculate 10**x

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.log(x)

Take log of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.log2(x)

Take log2 of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.log10(x)

Take log10 of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.log1p(x)

Take log(x + 1) with respect to input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.ldexp(x1, x2)

Returns x1 * (2 ** x2).

Parameters
Returns

y – The result.

Return type

PrimExpr

tvm.tir.clz(x)

Count leading zero bits of an integer x.

Parameters

x (PrimExpr) – Input 32 or 64 bit integer. The result is undefined if the input is 0.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.sin(x)

Take sin of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.sinh(x)

Take sinh of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.asin(x)

Take asin of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.asinh(x)

Take asinh of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.cos(x)

Take cos of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.cosh(x)

Take cosh of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.acos(x)

Take acos of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.acosh(x)

Take acos of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.tan(x)

Take tan of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.tanh(x)

Take hyperbolic tanh of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.atan(x)

Take atan of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.atan2(x1, x2)

Take arctan2(x1, x2).

Parameters
Returns

y – The result.

Return type

PrimExpr

tvm.tir.atanh(x)

Take atanh of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.erf(x)

Take gauss error function of the input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.sigmoid(x)

Quick function to get sigmoid

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.sqrt(x)

Take square root of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.rsqrt(x)

Take reciprocal of square root of input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.floor(x: tvm.tir.expr.PrimExprWithOp, span=None)

Take floor of float input x.

Parameters
  • x (PrimExpr) – Input argument.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.ceil(x, span=None)

Take ceil of float input x.

Parameters
  • x (PrimExpr) – Input argument.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.hypot(x1, x2)

Equivalent to sqrt(x1**2 + x2**2), element-wise.

Parameters
Returns

y – The result.

Return type

PrimExpr

tvm.tir.trunc(x, span=None)

Get truncated value of the input.

The truncated value of the scalar x is the nearest integer i which is closer to zero than x is.

Parameters
  • x (PrimExpr) – Input argument.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.abs(x, span=None)

Get absolute value of the input element-wise.

Parameters
  • x (PrimExpr) – Input argument.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.round(x, span=None)

Round elements of the array to the nearest integer.

Parameters
  • x (PrimExpr) – Input argument.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.nextafter(x1, x2)

Return the next floating-point value after x1 towards x2.

Parameters
Returns

y – The result.

Return type

PrimExpr

tvm.tir.nearbyint(x, span=None)

Round elements of the array to the nearest integer. This intrinsic uses llvm.nearbyint instead of llvm.round which is faster but will results different from te.round. Notably nearbyint rounds according to the rounding mode, whereas te.round (llvm.round) ignores that. For differences between the two see: https://en.cppreference.com/w/cpp/numeric/math/round https://en.cppreference.com/w/cpp/numeric/math/nearbyint

Parameters
  • x (PrimExpr) – Input argument.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.power(x, y, span=None)

x power y

Parameters
  • x (PrimExpr) – Input argument.

  • y (PrimExpr) – The exponent

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

z – The result.

Return type

PrimExpr

tvm.tir.popcount(x)

Count the number of set bits in input x.

Parameters

x (PrimExpr) – Input argument.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.fmod(x, y)

Return the remainder of x divided by y with the same sign as x.

Parameters
Returns

z – The result.

Return type

PrimExpr

tvm.tir.if_then_else(cond, t, f, span=None)

Conditional selection expression.

Parameters
  • cond (PrimExpr) – The condition

  • t (PrimExpr) – The result expression if cond is true.

  • f (PrimExpr) – The result expression if cond is false.

  • span (Optional[Span]) – The location of this operator in the source.

Returns

result – The result of conditional expression.

Return type

Node

Note

Unlike Select, if_then_else will not execute the branch that does not satisfy the condition. You can use it to guard against out of bound access. Unlike Select, if_then_else cannot be vectorized if some lanes in the vector have different conditions.

tvm.tir.isnan(x, span=None)

Check if input value is Nan.

Parameters
  • x (PrimExpr) – Input argument.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.isfinite(x, span=None)

Check if input value is finite.

Parameters
  • x (PrimExpr) – Input argument.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.isinf(x, span=None)

Check if input value is infinite.

Parameters
  • x (PrimExpr) – Input argument.

  • span (Optional[Span]) – The location of this operator in the source code.

Returns

y – The result.

Return type

PrimExpr

tvm.tir.copysign(x1, x2)

Change the sign of x1 to that of x2, element-wise.

Parameters
Returns

y – The result.

Return type

PrimExpr

tvm.tir.div(a, b, span=None)

Compute a / b as in C/C++ semantics.

Parameters
  • a (PrimExpr) – The left hand operand, known to be non-negative.

  • b (PrimExpr) – The right hand operand, known to be non-negative.

  • span (Optional[Span]) – The location of this operator in the source.

Returns

res – The result expression.

Return type

PrimExpr

Note

When operands are integers, returns truncdiv(a, b, span).

tvm.tir.indexdiv(a, b, span=None)

Compute floor(a / b) where a and b are non-negative.

Parameters
  • a (PrimExpr) – The left hand operand, known to be non-negative.

  • b (PrimExpr) – The right hand operand, known to be non-negative.

  • span (Optional[Span]) – The location of this operator in the source.

Returns

res – The result expression.

Return type

PrimExpr

Note

Use this function to split non-negative indices. This function may take advantage of operands’ non-negativeness.

tvm.tir.indexmod(a, b, span=None)

Compute the remainder of indexdiv. a and b are non-negative.

Parameters
  • a (PrimExpr) – The left hand operand, known to be non-negative.

  • b (PrimExpr) – The right hand operand, known to be non-negative.

  • span (Optional[Span]) – The location of this operator in the source.

Returns

res – The result expression.

Return type

PrimExpr

Note

Use this function to split non-negative indices. This function may take advantage of operands’ non-negativeness.

tvm.tir.truncdiv(a, b, span=None)

Compute the truncdiv of two expressions.

Parameters
  • a (PrimExpr) – The left hand operand

  • b (PrimExpr) – The right hand operand

  • span (Optional[Span]) – The location of this operator in the source.

Returns

res – The result expression.

Return type

PrimExpr

Note

This is the default integer division behavior in C.

tvm.tir.truncmod(a, b, span=None)

Compute the truncmod of two expressions.

Parameters
  • a (PrimExpr) – The left hand operand

  • b (PrimExpr) – The right hand operand

  • span (Optional[Span]) – The location of this operator in the source.

Returns

res – The result expression.

Return type

PrimExpr

Note

This is the default integer division behavior in C.

tvm.tir.floordiv(a, b, span=None)

Compute the floordiv of two expressions.

Parameters
  • a (PrimExpr) – The left hand operand

  • b (PrimExpr) – The right hand operand

  • span (Optional[Span]) – The location of this operator in the source.

Returns

res – The result expression.

Return type

PrimExpr

tvm.tir.floormod(a, b, span=None)

Compute the floormod of two expressions.

Parameters
  • a (PrimExpr) – The left hand operand

  • b (PrimExpr) – The right hand operand

  • span (Optional[Span]) – The location of this operator in the source.

Returns

res – The result expression.

Return type

PrimExpr

tvm.tir.comm_reducer(fcombine, fidentity, name='reduce')

Create a commutative reducer for reduction.

Parameters
  • fcombine (function(Expr -> Expr -> Expr)) – A binary function which takes two Expr as input to return a Expr.

  • fidentity (function(str -> Expr)) – A function which takes a type string as input to return a const Expr.

Returns

reducer – A function which creates a reduce expression over axis. There are two ways to use it:

  1. accept (expr, axis, where) to produce an Reduce Expr on specified axis;

  2. simply use it with multiple Exprs.

Return type

function

Example

n = te.var("n")
m = te.var("m")
mysum = te.comm_reducer(lambda x, y: x+y,
    lambda t: tvm.tir.const(0, dtype=t), name="mysum")
A = te.placeholder((n, m), name="A")
k = te.reduce_axis((0, m), name="k")
B = te.compute((n,), lambda i: mysum(A[i, k], axis=k), name="B")
tvm.tir.min(expr, axis, where=None, init=None, *args)

Create a min expression over axis.

Parameters
  • expr (PrimExpr) – The source expression.

  • axis (IterVar) – The reduction IterVar axis

  • where (optional, Expr) – Filtering predicate of the reduction.

Returns

value – The result value.

Return type

PrimExpr

Example

m = te.var("m")
n = te.var("n")
A = te.placeholder((m, n), name="A")
k = te.reduce_axis((0, n), name="k")

# there are two way to use this min reducer:
# mode 1, accept (expr, axis, where) to produce an Reduce Expr
# tvm.min represents tvm.te.min or tvm.tir.min.
B = te.compute((m,), lambda i: tvm.min(A[i, k], axis=k), name="B")

# mode 2, simply use it with multiple Exprs:
min_res = tvm.min(m, n)
tvm.tir.max(expr, axis, where=None, init=None, *args)

Create a max expression over axis.

Parameters
  • expr (PrimExpr) – The source expression.

  • axis (IterVar) – The reduction IterVar axis

  • where (optional, Expr) – Filtering predicate of the reduction.

Returns

value – The result value.

Return type

PrimExpr

Example

m = te.var("m")
n = te.var("n")
A = te.placeholder((m, n), name="A")
k = te.reduce_axis((0, n), name="k")

# there are two way to use this max reducer:
# mode 1, accept (expr, axis, where) to produce an Reduce Expr
# tvm.max represents tvm.te.max or tvm.tir.max.
B = te.compute((m,), lambda i: tvm.max(A[i, k], axis=k), name="B")

# mode 2, simply use it with multiple Exprs:
max_res = tvm.max(m, n)
tvm.tir.sum(expr, axis, where=None, init=None, *args)

Create a sum expression over axis.

Parameters
  • expr (PrimExpr) – The source expression.

  • axis (IterVar) – The reduction IterVar axis

  • where (optional, Expr) – Filtering predicate of the reduction.

Returns

value – The result value.

Return type

PrimExpr

Example

m = te.var("m")
n = te.var("n")
A = te.placeholder((m, n), name="A")
k = te.reduce_axis((0, n), name="k")

# there are two way to use this sum reducer:
# mode 1, accept (expr, axis, where) to produce an Reduce Expr
# tvm.sum represents tvm.te.sum or tvm.tir.sum.
B = te.compute((m,), lambda i: tvm.sum(A[i, k], axis=k), name="B")

# mode 2, simply use it with multiple Exprs:
sum_res = tvm.sum(m, n)
tvm.tir.q_multiply_shift(x, y, q, s)

Execute a multiplication between two Q-numbers x and y followed by a right shift s. The mathematical expression is:

out = round(x*y*2^-s)

More about Q-numbers here: https://en.wikipedia.org/wiki/Q_(number_format) The rounding rule is to the nearest value, rounding half up (i.e., round(x.1) = x and round (x.5) = x+1)

Parameters
  • x (PrimExpr) – First Q-number

  • y (PrimExpr) – Second Q-number

  • q (PrimExpr) – Number of fractional bits in x and y. Needs to be > 0

  • s (PrimExpr) – Integer shift

Returns

y – The result.

Return type

PrimExpr

class tvm.tir.StmtSRef

An object that refers to schedulable elements in the TensorIR, aka “sref”.

Glossary - Block sref: An StmtSref that points to a TensorIR block. - Loop sref: An StmtSRef that points to a TensorIR for loop. - Parent sref: The parent sref of an sref is the block/loop sref that points to its closest schedulable statement of its ancestors on the TensorIR AST. - Root sref: Sref to the root block. Every sref has exactly one parent sref except for root sref. - Sref tree: The parent-children-relationship of srefs that forms a tree, uniquely determined by the TensorIR AST.

Attributes:

stmt

The block/for stmt the object refers to

parent

The parent sref

Methods:

inline_mark()

A special StmtSRef, which doesn’t point to any stmt in the AST, only serving as a “mark” to hint compute-at to do the work of compute-inline

root_mark()

A special StmtSRef, which doesn’t point to any stmt in the AST, only serving as a “mark” to hint compute-at to do nothing

property stmt: Optional[Union[tvm.tir.stmt.Block, tvm.tir.stmt.For]]

The block/for stmt the object refers to

property parent: Optional[tvm.tir.schedule.block_scope.StmtSRef]

The parent sref

static inline_mark() tvm.tir.schedule.block_scope.StmtSRef

A special StmtSRef, which doesn’t point to any stmt in the AST, only serving as a “mark” to hint compute-at to do the work of compute-inline

static root_mark() tvm.tir.schedule.block_scope.StmtSRef

A special StmtSRef, which doesn’t point to any stmt in the AST, only serving as a “mark” to hint compute-at to do nothing

class tvm.tir.BlockScope

An object corresponds to each block sref in the sref tree, which tracks the producer-consumer dependency between blocks.

Glossary:

  • Block scope: A contiguous subtree of the sref tree, rooted at each block sref, whose components are:

    • scope root: a block sref

    • internal srefs: loop srefs

    • scope leaves: block srefs

  • Child block: The scope leaf blocks under the scope root or a specific internal sref

Methods:

get_deps_by_src(block)

Get all dependencies whose src is the target`block`.

get_deps_by_dst(block)

Get all dependencies whose dst is the target block.

get_deps_by_src(block: tvm.tir.schedule.block_scope.StmtSRef) List[tvm.tir.schedule.block_scope.Dependency]

Get all dependencies whose src is the target`block`.

Parameters

block (StmtSRef) – The queried block

Returns

blocks – The dependencies

Return type

List[Dependency]

get_deps_by_dst(block: tvm.tir.schedule.block_scope.StmtSRef) List[tvm.tir.schedule.block_scope.Dependency]

Get all dependencies whose dst is the target block.

Parameters

block (StmtSRef) – The queried block

Returns

blocks – The dependencies

Return type

List[Dependency]

class tvm.tir.ScheduleState(mod: Union[tvm.tir.function.PrimFunc, tvm.ir.module.IRModule], *, debug_mask: Union[str, int] = 'none')

The state of scheduling, which exposes a Replace method as the primary resort for all the scheduling primitives to manipulate the TensorIR.

The data structure contains the following information 1) The AST being scheduled (mod) 2) The sref tree of schedulable statements (indicated by the srefs) 3) The dependency information of each block scope (block_info) 4) A reverse mapping from the AST nodes to that in the sref tree (get_sref) 5) A debug flag, if set, extra checking is enabled (debug_mask)

Parameters
  • mod (IRModule) – The AST of the module being scheduled

  • debug_mask (int) – Do extra correctness checking after the object construction and each time after calling the Replace method.

Methods:

get_sref(stmt)

Return the corresponding sref that points to the stmt

get_block_scope(block_sref)

Get the BlockScope correpsonding to the block sref

replace(src_sref, tgt_stmt[, block_sref_reuse])

Replace the part of the AST, as being pointed to by src_sref, with a specific statement tgt_stmt, and maintain the sref tree accordingly.

get_sref(stmt: Union[tvm.tir.stmt.Block, tvm.tir.stmt.For]) Optional[tvm.tir.schedule.block_scope.StmtSRef]

Return the corresponding sref that points to the stmt

Parameters

stmt (Union[Block, For]) – The schedulable statement in the TensorIR to be retrieved for its sref

Returns

sref – The corresponding sref

Return type

StmtSRef

get_block_scope(block_sref: tvm.tir.schedule.block_scope.StmtSRef) tvm.tir.schedule.block_scope.BlockScope

Get the BlockScope correpsonding to the block sref

Parameters

block_sref (StmtSRef) – The block sref to be retrieved

Returns

sref – The corresponding sref

Return type

StmtSRef

replace(src_sref: tvm.tir.schedule.block_scope.StmtSRef, tgt_stmt: Union[tvm.tir.stmt.Block, tvm.tir.stmt.For, tvm.tir.stmt.BlockRealize], block_sref_reuse: Optional[Dict[tvm.tir.stmt.Block, tvm.tir.stmt.Block]] = None) None

Replace the part of the AST, as being pointed to by src_sref, with a specific statement tgt_stmt, and maintain the sref tree accordingly. Replace will try to perform copy on write as much as possible when the ScheduleState holds the only copy to the IRModule and IR nodes.

Only 3 types of replacements are allowed: from src_sref->stmt to tgt_stmt. 1) Block -> Block 2) Loop -> Loop 3) Loop -> BlockRealize

Parameters
  • src_sref (StmtSRef) – The sref to the statement to be replaced in the TensorIR AST

  • tgt_stmt (Union[Block, For, BlockRealize]) – The statement to be replaced to

  • block_sref_reuse (Optional[Dict[Block, Block]] = None) – Maps an old block (to be replaced in the subtree under src_sref->stmt) to a new block (replaced to, in the subtree under tgt_stmt), and enforces reuse of srefs between them (rather than create new srefs) i.e. after being replaced, the sref that points to the old block will point to the new one

Note

The reuse of loop srefs are detected automatically according to the reuse of loop vars.

class tvm.tir.Schedule(mod: Union[tvm.tir.function.PrimFunc, tvm.ir.module.IRModule], *, seed: Optional[int] = None, debug_mask: Union[str, int] = 'none', error_render_level: str = 'detail')

The user-facing schedule class

A schedule is a set of transformations that change the order of computation but preserve the semantics of computation. Some example of schedules: 1) Split a loop into two; 2) Reorder two loops; 3) Inline the computation of a specific buffer into its consumer

The schedule class stores auxiliary information to schedule correctly and efficiently.

Link to tutorial: https://tvm.apache.org/docs/tutorials/language/schedule_primitives.html

Attributes:

mod

Returns the AST of the module being scheduled

state

Returns the ScheduleState in the current schedule class

trace

Returns the internally maintained trace of scheduling program execution

Methods:

copy()

Returns a copy of the schedule, including both the state and the symbol table, * guaranteeing that * 1) SRef tree is completely reconstructed; * 2) The IRModule being scheduled is untouched; * 3) All the random variables are valid in the copy, pointing to the corresponding sref * reconstructed

seed(seed)

Seed the randomness

fork_seed()

Returns a forked random state as seed for new schedules

show(rand_var)

Returns a string representation of the value that the random variable evaluates to

get(rand_var_or_sref)

Returns: - the corresponding Block that a BlockRV evaluates to; - the corresponding For that a LoopRV evaluates to; - the corresponding integer that a ExprRV evaluates to; - the corresponding Block that a block sref points to; - the corresponding For that a loop sref points to;

get_sref(rand_var_or_stmt)

Returns the corresponding sref to the given 1) LoopRV 2) BlockRV 3) Block 4) For

remove_rv(rand_var)

Remove a random variable from the symbol table

sample_categorical(candidates, probs[, decision])

Sample an integer given the probability distribution

get_block(name[, func_name])

Retrieve a block in a specific function with its name

get_loops(block)

Get the parent loops of the block in its scope, from outer to inner

fuse(*loops)

Fuse a list of consecutive loops into one.

split(loop, factors)

Split a loop into a list of consecutive loops.

reorder(*ordered_loops)

Reorder a list of loops.

parallel(loop)

Parallelize the input loop.

vectorize(loop)

Vectorize the input loop.

bind(loop, thread_axis)

Bind the input loop to the given thread axis.

unroll(loop)

Unroll the input loop.

cache_read(block, read_buffer_index, …)

Create a block that reads a buffer region into a read cache.

cache_write(block, write_buffer_index, …)

Create a block that reads a buffer region into a write cache.

compute_at(block, loop[, preserve_unit_loops])

Compute-At.

reverse_compute_at(block, loop[, …])

Reverse-Compute-At.

compute_inline(block)

Inline a block into its consumer(s).

reverse_compute_inline(block)

Inline a block into its only producer.

decompose_reduction(block, loop)

Decompose a reduction block into two separate blocks.

rfactor(loop, factor_axis)

Factorize an associative reduction block by the specified loop.

storage_align(block, buffer_index, axis, …)

Set alignment requirement for specific dimension such that stride[axis] == k * factor + offset for some k.

enter_postproc()

A no-op that marks the start of postprocessing phase of scheduling

property mod: tvm.ir.module.IRModule

Returns the AST of the module being scheduled

property state: tvm.tir.schedule.state.ScheduleState

Returns the ScheduleState in the current schedule class

property trace: Optional[tvm.tir.schedule.trace.Trace]

Returns the internally maintained trace of scheduling program execution

copy() tvm.tir.schedule.schedule.Schedule

Returns a copy of the schedule, including both the state and the symbol table, * guaranteeing that * 1) SRef tree is completely reconstructed; * 2) The IRModule being scheduled is untouched; * 3) All the random variables are valid in the copy, pointing to the corresponding sref * reconstructed

Returns

copy – A new copy of the schedule

Return type

Schedule

seed(seed: int) None

Seed the randomness

Parameters

seed (int) – The new random seed, -1 if use device random, otherwise non-negative

fork_seed() int

Returns a forked random state as seed for new schedules

Returns

seed – The forked random state, not the same as the current random state

Return type

int

show(rand_var: Union[tvm.ir.expr.PrimExpr, tvm.tir.schedule.schedule.BlockRV, tvm.tir.schedule.schedule.LoopRV]) str

Returns a string representation of the value that the random variable evaluates to

Parameters

rand_var (Union[ExprRV, BlockRV, LoopRV]) – The random variable to be evaluated

Returns

str_repr – The string representation

Return type

str

get(rand_var_or_sref: Union[tvm.ir.expr.PrimExpr, tvm.tir.schedule.schedule.BlockRV, tvm.tir.schedule.schedule.LoopRV, tvm.tir.schedule.block_scope.StmtSRef]) Optional[Union[int, tvm.tir.stmt.Block, tvm.tir.stmt.For]]

Returns: - the corresponding Block that a BlockRV evaluates to; - the corresponding For that a LoopRV evaluates to; - the corresponding integer that a ExprRV evaluates to; - the corresponding Block that a block sref points to; - the corresponding For that a loop sref points to;

Parameters

rand_var_or_sref (Union[ExprRV, BlockRV, LoopRV, StmtSRef]) – The random variable / sref to be evaluated

Returns

result – The corresponding result

Return type

Optional[Union[int, Block, For]]

get_sref(rand_var_or_stmt: Union[tvm.tir.schedule.schedule.BlockRV, tvm.tir.schedule.schedule.LoopRV, tvm.tir.stmt.Block, tvm.tir.stmt.For]) Optional[tvm.tir.schedule.block_scope.StmtSRef]

Returns the corresponding sref to the given 1) LoopRV 2) BlockRV 3) Block 4) For

Parameters

rand_var_or_stmt (Union[BlockRV, LoopRV, Block, For]) – The random variable / sref to be evaluated

Returns

result – The corresponding result

Return type

Optional[StmtSRef]

remove_rv(rand_var: Union[tvm.ir.expr.PrimExpr, tvm.tir.schedule.schedule.BlockRV, tvm.tir.schedule.schedule.LoopRV]) None

Remove a random variable from the symbol table

Parameters

rand_var (Union[BlockRV, LoopRV, ExprRV]) – The random variable to be removed

sample_categorical(candidates: List[int], probs: List[float], decision: Optional[int] = None) tvm.ir.expr.PrimExpr

Sample an integer given the probability distribution

Parameters
  • candidates (List[int]) – The candidates to be sampled from

  • probs (List[float]) – The probability of each candidate

  • decision (Optional[int]) – The sampling decision, if any

Returns

result – The random variable sampled from candidates

Return type

ExprRV

get_block(name: str, func_name: str = 'main') tvm.tir.schedule.schedule.BlockRV

Retrieve a block in a specific function with its name

Parameters
  • name (str) – The name of the block

  • func_name (str = "main") – The name of the function

Returns

block – The block retrieved IndexError is raised if 0 or multiple blocks exist with the specific name.

Return type

BlockRV

get_loops(block: tvm.tir.schedule.schedule.BlockRV) List[tvm.tir.schedule.schedule.LoopRV]

Get the parent loops of the block in its scope, from outer to inner

Parameters

block (BlockRV) – The query block

Returns

loops – A list of loops above the given block in its scope, from outer to inner

Return type

List[LoopRV]

fuse(*loops: List[tvm.tir.schedule.schedule.LoopRV]) tvm.tir.schedule.schedule.LoopRV

Fuse a list of consecutive loops into one. It requires: 1) The loops can’t have annotations or thread bindings. 2) The (i+1)-th loop must be the only child of the i-th loop. 3) All loops must start with 0. 4) The domain of a loop to be fused cannot depend on another loop to be fused.

Parameters

*loops (List[LoopRV]) – The loops to be fused

Returns

fused_loop – The new loop after fusion

Return type

LoopRV

Examples

Before applying fuse, in TensorIR, the IR is:

@T.prim_func
def before_fuse(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0

Create the schedule and do fuse:

sch = tir.Schedule(before_fuse)
i, j = sch.get_loops(sch.get_block("B"))
sch.fuse(i, j)
print(sch.mod["main"].script())

After applying fuse, the IR becomes:

@T.prim_func
def after_fuse(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    # the 2 loops are fused into 1
    for i_j_fused in T.serial(0, 16384):
        with T.block("B"):
            vi = T.axis.S(128, T.floordiv(i_j_fused, 128))
            vj = T.axis.S(128, T.floormod(i_j_fused, 128))
            B[vi, vj] = A[vi, vj] * 2.0
split(loop: tvm.tir.schedule.schedule.LoopRV, factors: List[Optional[tvm.ir.expr.PrimExpr]]) List[tvm.tir.schedule.schedule.LoopRV]

Split a loop into a list of consecutive loops. It requires: 1) The loop can’t have annotation or thread binding. 2) The loop must start with 0. Predicates may be added to ensure the total loop numbers keeps unchanged. In factors, at most one of the factors can be None, which will be automatically inferred.

Parameters
  • loop (LoopRV) – The loop to be split

  • factors (List[Union[ExprRV, None]]) – The splitting factors Potential inputs are: - None - ExprRV - Nonnegative constant integers

Returns

split_loops – The new loops after split

Return type

List[LoopRV]

Examples

Before split, in TensorIR, the IR is:

@T.prim_func
def before_split(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B") as [vi, vj]:
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0

Create the schedule and do split:

sch = tir.Schedule(before_split)
i, j = sch.get_loops(sch.get_block("B"))
sch.split(i, factors=[2, 64])
print(sch.mod["main"].script())

After applying split, the IR becomes:

@T.prim_func
def after_split(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    # the original loop is split into 2 loops
    for i0, i1, j in T.grid(2, 64, 128):
        with T.block("B"):
            vi = T.axis.S(128, i0 * 64 + i1)
            vj = T.axis.S(128, j)
            B[vi, vj] = A[vi, vj] * 2.0
reorder(*ordered_loops: List[tvm.tir.schedule.schedule.LoopRV]) None

Reorder a list of loops. It doesn’t require the loops to be consecutive. It requires: 1) The loops are in the same chain. That means: the loops can be ordered to [l_1, l_2, … , l_n] where l_i is an ancestor of l_{i+1} and there are only single-branch loops between l_1 and l_n (which also indicates they are under the same scope). 2) After reordering, the domain of an outer loop cannot depend on any of the inner loops. 3) For every block under the loop nests, its block binding must be affine, and the block variables must be either data parallel or reduction. 4) No duplicated loops are allowed in the arguments.

Parameters

*ordered_loops (List[LoopRV]) – The loops in the new order

Examples

Before reorder, in TensorIR, the IR is:

@T.prim_func
def before_reorder(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0

Create the schedule and do reorder:

sch = tir.Schedule(before_reorder)
i, j = sch.get_loops(sch.get_block("B"))
sch.reorder(j, i)
print(sch.mod["main"].script())

After applying reorder, the IR becomes:

@T.prim_func
def after_reorder(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    # Here j and i are reordered
    for j, i in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0
parallel(loop: tvm.tir.schedule.schedule.LoopRV) None

Parallelize the input loop. It requires: 1) The scope block that the loop is in should have stage-pipeline property 2) All the blocks under the loop are complete blocks or reduction blocks, and have affine bindings 3) For each block under the loop, the loop can only be contained in data-parallel block iters’ bindings

Parameters

loop (LoopRV) – The loop to be parallelized

Examples

Before parallel, in TensorIR, the IR is:

@T.prim_func
def before_parallel(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0

Create the schedule and do parallel:

sch = tir.Schedule(before_parallel)
i, j = sch.get_loops(sch.get_block("B"))
sch.parallel(i)

After applying parallel, the IR becomes:

@T.prim_func
def after_parallel(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i in T.parallel(0, 128):
        for j in T.serial(0, 128):
            with T.block("B"):
                vi, vj = T.axis.remap("SS", [i, j])
                B[vi, vj] = A[vi, vj] * 2.0
vectorize(loop: tvm.tir.schedule.schedule.LoopRV) None

Vectorize the input loop. It requires: 1) The scope block that the loop is in should have stage-pipeline property 2) All the blocks under the loop are complete blocks or reduction blocks, and have affine bindings 3) For each block under the loop, the loop can only be contained in data-parallel block iters’ bindings

Parameters

loop (LoopRV) – The loop to be vectorized

Examples

Before vectorize, in TensorIR, the IR is:

@T.prim_func
def before_vectorize(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0

Create the schedule and do vectorize:

sch = tir.Schedule(before_vectorize)
i, j = sch.get_loops(sch.get_block("B"))
sch.vectorize(j)

After applying vectorize, the IR becomes:

@T.prim_func
def after_vectorize(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i in T.serial(0, 128):
        for j in T.vectorized(0, 128):
            with T.block("B"):
                vi, vj = T.axis.remap("SS", [i, j])
                B[vi, vj] = A[vi, vj] * 2.0
bind(loop: tvm.tir.schedule.schedule.LoopRV, thread_axis: str) None

Bind the input loop to the given thread axis. It requires: 1) The scope block that the loop is in should have stage-pipeline property 2) All the blocks under the loop are complete blocks or reduction blocks, and have affine bindings 3) For each block under the loop, if the thread axis starts with “threadIdx`, the loop can only be contained in data-parallel block iter and reduction block iters’ bindings. Otherwise the loop can only be contained in data-parallel block iters’ bindings

Parameters
  • loop (LoopRV) – The loop to be bound to the thread axis

  • thread_axis (str) – The thread axis to be bound to the loop. Possible candidates: - blockIdx.x/y/z - threadIdx.x/y/z - vthread.x/y/z - vthread (It is a legacy behavior that will be deprecated. Please use vthread.x/y/z instead.)

Examples

Before bind, in TensorIR, the IR is:

@T.prim_func
def before_bind(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0

Create the schedule and do bind:

sch = tir.Schedule(before_bind)
i, j = sch.get_loops(sch.get_block("B"))
sch.bind(i, "blockIdx.x")
sch.bind(j, "threadIdx.x")

After applying bind, the IR becomes:

@T.prim_func
def after_bind(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i in T.thread_binding(0, 128, thread = "blockIdx.x"):
        for j in T.thread_binding(0, 128, thread = "threadIdx.x"):
            with T.block("B"):
                vi, vj = T.axis.remap("SS", [i, j])
                B[vi, vj] = A[vi, vj] * 2.0
unroll(loop: tvm.tir.schedule.schedule.LoopRV) None

Unroll the input loop. It requires nothing

Parameters

loop (LoopRV) – The loop to be unrolled

Examples

Before unroll, in TensorIR, the IR is:

@T.prim_func
def before_unroll(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0

Create the schedule and do unroll:

sch = tir.Schedule(before_unroll)
i, j = sch.get_loops(sch.get_block("B"))
sch.unroll(i)

After applying unroll, the IR becomes:

@T.prim_func
def after_unroll(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i in T.unroll(0, 128):
        for j in T.serial(0, 128):
            with T.block("B"):
                vi, vj = T.axis.remap("SS", [i, j])
                B[vi, vj] = A[vi, vj] * 2.0
cache_read(block: tvm.tir.schedule.schedule.BlockRV, read_buffer_index: int, storage_scope: str) tvm.tir.schedule.schedule.BlockRV

Create a block that reads a buffer region into a read cache. It requires:

  1. There is at most one block who write the buffer in the scope.

  2. The scope block have stage-pipeline property.

Parameters
  • block (BlockRV) – The consumer block of the target buffer.

  • read_buffer_index (int) – The index of the buffer in block’s read region.

  • storage_scope (str) – The target storage scope.

Returns

cached_block – The block of the cache stage

Return type

BlockRV

Examples

Before cache_read, in TensorIR, the IR is:

@T.prim_func
def before_cache_read(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0

Create the schedule and cache_read:

sch = tir.Schedule(before_cache_read)
block_b = sch.get_block("B")
sch.cache_read(block_b, 0, "local")
print(sch.mod["main"].script())

After applying cache_read, the IR becomes:

@T.prim_func
def after_cache_read(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    A_local = T.alloc_buffer((128, 128), scope="local")
    for i, j in T.grid(128, 128):
        with T.block("A_local"):
            vi, vj = T.axis.remap("SS", [i, j])
            A_local[vi, vj] = A[vi, vj]
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A_local[vi, vj] * 2.0
cache_write(block: tvm.tir.schedule.schedule.BlockRV, write_buffer_index: int, storage_scope: str) tvm.tir.schedule.schedule.BlockRV

Create a block that reads a buffer region into a write cache. It requires:

  1. There is only one block who write the buffer in the scope.

  2. The scope block have stage-pipeline property.

Parameters
  • block (BlockRV) – The producer block of the target buffer.

  • write_buffer_index (int) – The index of the buffer in block’s write region.

  • storage_scope (str) – The target storage scope.

Returns

cached_block – The block of the cache stage

Return type

BlockRV

Examples

Before cache_write, in TensorIR, the IR is:

@T.prim_func
def before_cache_write(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0

Create the schedule and cache_write:

sch = tir.Schedule(before_cache_write)
block_b = sch.get_block("B")
sch.cache_write(block_b, 0, "local")
print(sch.mod["main"].script())

After applying cache_write, the IR becomes:

@T.prim_func
def after_cache_write(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.match_buffer(b, (128, 128))
    B_local = T.alloc_buffer((128, 128), scope="local")
    for i, j in T.grid(128, 128):
        with T.block("A_local"):
            vi, vj = T.axis.remap("SS", [i, j])
            B_local[vi, vj] = A[vi, vj] * 2.0
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = B_local[vi, vj]
compute_at(block: tvm.tir.schedule.schedule.BlockRV, loop: tvm.tir.schedule.schedule.LoopRV, preserve_unit_loops: bool = False) None

Compute-At. Move a producer block under the specific loop, and regenerate the loops induced by the block so that the buffer region produced by the producer block could cover those regions consumed by its consumer blocks under the given loop. It requires:

  1. block and loop are under the same scope, loop is not the ancestor of block

  2. The scope block has stage-pipeline property

3) The subtree of the scope block, where the given block is in, satisfies the compact dataflow condition. i.e. all the blocks in the scope block’s subtree must be either complete block or reduction block

4) The block is not an output block with regard to the scope block, i.e. the buffers written by the block are allocated under the scope block

  1. All the consumers of the block are under the given loop

Parameters
  • block (BlockRV) – The block to be moved

  • loop (LoopRV) – The loop where the block to be moved under

  • preserve_unit_loops (bool) – Whether to keep the trivial loops whose extents are 1

Examples

Before compute-at, in TensorIR, the IR is:

@T.prim_func
def before_compute_at(a: T.handle, c: T.handle) -> None:
    A = T.match_buffer(a, (128, 128), "float32")
    B = T.alloc_buffer((128, 128), "float32")
    C = T.match_buffer(c, (128, 128), "float32")
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0
    for i, j in T.grid(128, 128):
        with T.block("C"):
            vi, vj = T.axis.remap("SS", [i, j])
            C[vi, vj] = B[vi, vj] + 1.0

Create the schedule and do compute-at:

sch = tir.Schedule(before_compute_at)
block = sch.get_block("B")
loop, _ = sch.get_loops(sch.get_block("C"))
sch.compute_at(block, loop, preserve_unit_loops=False)
print(sch.mod["main"].script())

After applying compute-at, the IR becomes:

@T.prim_func
def after_compute_at(a: T.handle, c: T.handle) -> None:
    A = T.match_buffer(a, (128, 128), "float32")
    B = T.alloc_buffer((128, 128), "float32")
    C = T.match_buffer(c, (128, 128), "float32")
    for i in T.serial(0, 128):
        for j in T.serial(0, 128):
            with T.block("B"):
                vi, vj = T.axis.remap("SS", [i, j])
                B[vi, vj] = A[vi, vj] * 2.0
        for j in T.serial(0, 128):
            with T.block("C"):
                vi, vj = T.axis.remap("SS", [i, j])
                C[vi, vj] = B[vi, vj] + 1.0
reverse_compute_at(block: tvm.tir.schedule.schedule.BlockRV, loop: tvm.tir.schedule.schedule.LoopRV, preserve_unit_loops: bool = False) None

Reverse-Compute-At. Move a consumer block under the specific loop, and regenerate the loops induced by the block so that the buffer region consumed by the consumer block could cover those regions produced by its producer blocks under the given loop. It requires:

  1. block and loop are under the same scope, loop is not the ancestor of block

  2. The scope block has stage-pipeline property

3) The subtree of the scope block, where the given block is in, satisfies the compact dataflow condition. i.e. all the blocks in the scope block’s subtree must be either complete block or reduction block

  1. All the producers of the block are under the given loop

Parameters
  • block (BlockRV) – The block to be moved

  • loop (LoopRV) – The loop where the block to be moved under

  • preserve_unit_loops (bool) – Whether to keep the trivial loops whose extents are 1

Examples

Before reverse-compute-at, in TensorIR, the IR is:

@T.prim_func
def before_reverse_compute_at(a: T.handle, c: T.handle) -> None:
    A = T.match_buffer(a, (128, 128), "float32")
    B = T.alloc_buffer((128, 128), "float32")
    C = T.match_buffer(c, (128, 128), "float32")
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0
    for i, j in T.grid(128, 128):
        with T.block("C"):
            vi, vj = T.axis.remap("SS", [i, j])
            C[vi, vj] = B[vi, vj] + 1.0

Create the schedule and do reverse-compute-at:

sch = tir.Schedule(before_reverse_compute_at)
block = sch.get_block("C")
loop, _ = sch.get_loops(sch.get_block("B"))
sch.reverse_compute_at(block, loop, preserve_unit_loops=False)
print(sch.mod["main"].script())

After applying reverse-compute-at, the IR becomes:

@T.prim_func
def after_reverse_compute_at(a: T.handle, c: T.handle) -> None:
    A = T.match_buffer(a, (128, 128), "float32")
    B = T.alloc_buffer((128, 128), "float32")
    C = T.match_buffer(c, (128, 128), "float32")
    for i in T.serial(0, 128):
        for j in T.serial(0, 128):
            with T.block("B"):
                vi, vj = T.axis.remap("SS", [i, j])
                B[vi, vj] = A[vi, vj] * 2.0
        for j in T.serial(0, 128):
            with T.block("C"):
                vi, vj = T.axis.remap("SS", [i, j])
                C[vi, vj] = B[vi, vj] + 1.0
compute_inline(block: tvm.tir.schedule.schedule.BlockRV) None

Inline a block into its consumer(s). It requires:

  1. The block is a complete non-root block, which only produces one buffer

  2. The block must not be the only leaf in the scope.

  3. The body of the block must be a BufferStore statement in the form of, A[i, j, k, ...] = ... where the indices of the LHS are all distinct atomic variables, and no variables other than those indexing variables are allowed in the statement.

Parameters

block (BlockRV) – The block to be inlined to its consumer(s)

Examples

Before compute-inline, in TensorIR, the IR is:

@T.prim_func
def before_inline(a: T.handle, c: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.alloc_buffer((128, 128))
    C = T.match_buffer(c, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0
    for i, j in T.grid(128, 128):
        with T.block("C"):
            vi, vj = T.axis.remap("SS", [i, j])
            C[vi, vj] = B[vi, vj] + 1.0

Create the schedule and do compute-inline:

sch = tir.Schedule(before_inline)
sch.compute_inline(sch.get_block("B"))
print(sch.mod["main"].script())

After applying compute-inline, the IR becomes:

@T.prim_func
def after_inline(a: T.handle, c: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    C = T.match_buffer(c, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("C"):
            vi, vj = T.axis.remap("SS", [i, j])
            C[vi, vj] = A[vi, vj] * 2.0 + 1.0
reverse_compute_inline(block: tvm.tir.schedule.schedule.BlockRV) None

Inline a block into its only producer. It requires:

  1. The block is a complete non-root block, which only produces and consumes one buffer

  2. The block must not be the only leaf in the scope.

  3. The only producer of the block is a read-after-write producer and a complete non-root block

  4. The body of the block must be a BufferStore statement in the form of, B[f(i, j, k, ...)] = g(i, j, k, A[i, j, k, ...] ...) where the indices of each BufferLoad on the RHS are all distinct atomic variables, and no variables other than those indexing variables are allowed in the statement.

Parameters

block (BlockRV) – The block to be inlined to its producer

Examples

Before reverse-compute-inline, in TensorIR, the IR is:

@T.prim_func
def before_inline(a: T.handle, c: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.alloc_buffer((128, 128))
    C = T.match_buffer(c, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0
    for i, j in T.grid(128, 128):
        with T.block("C"):
            vi, vj = T.axis.remap("SS", [i, j])
            C[vi, vj] = B[vi, vj] + 1.0

Create the schedule and do reverse-compute-inline:

sch = tir.Schedule(before_inline)
sch.reverse_compute_inline(sch.get_block("C"))
print(sch.mod["main"].script())

After applying reverse-compute-inline, the IR becomes:

@T.prim_func
def after_inline(a: T.handle, c: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    C = T.match_buffer(c, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("C"):
            vi, vj = T.axis.remap("SS", [i, j])
            C[vi, vj] = A[vi, vj] * 2.0 + 1.0
decompose_reduction(block: tvm.tir.schedule.schedule.BlockRV, loop: tvm.tir.schedule.schedule.LoopRV) tvm.tir.schedule.schedule.BlockRV

Decompose a reduction block into two separate blocks.

  1. The init block, which is translated from the init statement of the reduction block;

  2. The update block, which is the original block without init statement.

The init block is inserted right before the given loop.

The schedule primitive requires:

  1. The input block is a reduction block.

  2. The input loop is the ancestor of the block.

  3. The input loop is not lower than all the loops related to reduce block var.

Parameters
  • block (BlockRV) – The reduction block to be decomposed

  • loop (LoopRV) – The loop above which the init block is inserted before.

Returns

init_block – The init block

Return type

BlockRV

Examples

Before decompose-reduction, in TensorIR, the IR is:

@tvm.script.tir
def before_decompose(a: ty.handle, c: ty.handle) -> None:
    A = tir.match_buffer(a, [128, 128])
    B = tir.match_buffer(b, [128, 128])
    C = tir.match_buffer(c, [128, 128])
    for i, j, k in tir.grid(128, 128, 128):
        with tir.block([128, 128, tir.reduce_axis(0, 128)], "C") as [vi, vj, vk]:
            with tir.init():
                C[vi, vj] = 0.0
            C[vi, vj] = C[vi, vj] + A[vi, vk] * B[vj, vk]

Create the schedule and do decompose-reduction with specified loop:

sch = tir.Schedule(before_decompose)
C = sch.get_block("C")
i, j, k = sch.get_loops(C)
sch.decompose_reduction(C, i)
print(tvm.script.asscript(sch.mod["main"]))

After applying decompose-reduction, the IR becomes:

@tvm.script.tir
def after_decompose(a: ty.handle, c: ty.handle) -> None:
    A = tir.match_buffer(a, [128, 128])
    B = tir.match_buffer(b, [128, 128])
    C = tir.match_buffer(c, [128, 128])
    for i in tir.serial(128):
        for j in tir.serial(128):
            with tir.block([128, 128]) as [vi, vj]:
                C[vi, vj] = 0.0
    for i, j, k in tir.grid(128, 128, 128):
        with tir.block([128, 128, tir.reduce_axis(0, 128)], "C") as [vi, vj, vk]:
            C[vi, vj] = C[vi, vj] + A[vi, vk] * B[vj, vk]
rfactor(loop: tvm.tir.schedule.schedule.LoopRV, factor_axis: int) tvm.tir.schedule.schedule.LoopRV

Factorize an associative reduction block by the specified loop.

An associative reduction cannot be parallelized directly, because it leads to potential race condition during accumulation. Alternatively, the reduction could be factorized on a loop with the following steps: - Step 1: evenly slice the reduction into n separate chunks, where n is the loop extent - Step 2: compute the chunks separately and write the result into n intermediate buffers; - Step 3: accumulate the n separate buffer into the result buffer. Note that the Step 2 above introduces opportunities for parallelization.

RFactor is a schedule primitive that implements the transformation described above: Given a block that writes to buffer B, it factorizes a loop of extent n.

For example, the pseudocode below accumulates B[i] = sum(A[i, : , : ]):

for i in range(128):                    # loop i is a data parallel loop
    for j in range(128):                # loop j is a reduction loop
        for k in range(128):            # loop k is a reduction loop
            B[i] = B[i] + A[i, j, k]

Suppose RFactor is applied on the innermost loop k and factor_axis = 1. RFactor then creates an intermediate buffer and two blocks.

1. The intermediate buffer, or “rf-buffer” is a buffer of rank ndim(B) + 1 and size size(B) * n, whose shape expands from shape(B) by adding an axis of n at the position specified by factor_axis. For example,

  • shape(B) = [1, 2, 3], factor_axis = 0 => shape(B_rf) = [n, 1, 2, 3]

  • shape(B) = [1, 2, 3], factor_axis = 1 => shape(B_rf) = [1, n, 2, 3]

  • shape(B) = [1, 2, 3], factor_axis = 2 => shape(B_rf) = [1, 2, n, 3]

  • shape(B) = [1, 2, 3], factor_axis = 3 => shape(B_rf) = [1, 2, 3, n]

2. The rfactor block, or “rf-block”, is a block that writes to the rf-buffer without accumulating over the loop k, i.e. the loop k is converted from a reduction loop to a data parallel loop. In our example, the rf-block is:

B_rf = np.zeros((128, 128))     # the rf-buffer
for k in range(128):            # loop k is converted to a data parallel loop
    for i in range(128):        # loop i is a data parallel loop (unchanged)
        for j in range(128):    # loop j is a reduction loop (unchanged)
            B_rf[i, k] = B_rf[i, k] + A[i, j, k]

3. The write-back block, or wb-block, is a block that accumulates the rf-buffer into the result buffer. All the reduction loops are removed except the loop k for accumulation. In our example, the wb-block is:

for i in range(128):            # loop i is a data parallel loop (unchanged)
                                # loop j is removed because it is a reduction loop
    for k in range(128):        # loop k is a reduction loop (unchanged)
        B[i] = B[i] + B_rf[i, k]
Parameters
  • loop (LoopRV) – The loop outside block for which we want to do rfactor

  • factor_axis (int) – The position where the new dimension is placed in the new introduced rfactor buffer

Returns

rf_block – The block which computes partial results over each slices (i.e., the first block as described in the above illustration)

Return type

BlockRV

Examples

Before rfactor, in TensorIR, the IR is:

@T.prim_func
def before_rfactor(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (128, 128, 128))
    B = T.match_buffer(b, (128,))
    for ii, i, j in T.grid(128, 128, 128):
    with T.block("B"):
        vii, vi, vj = T.axis.remap("SRR", [ii, i, j])
        with T.init():
            B[vii] = 0.0
        B[vii] = B[vii] + A[vii, vi, vj]

Create the schedule and do rfactor:

sch = tir.Schedule(before_rfactor)
_, _, k = sch.get_loops(sch.get_block("B"))
sch.rfactor(k, 0)
print(sch.mod["main"].script())

After applying rfactor, the IR becomes:

@T.prim_func
def after_rfactor(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, [128, 128, 128])
    B = T.match_buffer(b, [128])
    B_rf = T.alloc_buffer([128, 128])
    for i2, ii, i in T.grid(128, 128, 128):
        with T.block("B_rf"):
            vi2, vii, vi = T.axis.remap("SSR", [i2, ii, i])
            with T.init():
                B_rf[vi2, vii] = 0.0
            B_rf[vi2, vii] = (B_rf[vi2, vii] + A[vii, vi, vi2])
    for ii, i2 in T.grid(128, 128):
        with T.block("B"):
            vii, vi2 = T.axis.remap("SR", [ii, i2])
            with T.init():
                B[vii] = 0.0
            B[vii] = B[vii] + B_rf[vi2, vii]

Note

Rfactor requires: 1) loop has only one child block, and it is a reduction block; 2) loop is a reduction loop, i.e. the loop variable is bound to only reduction variables in the block binding; 3) loop is not parallelized, vectorized, unrolled or bound to any thread axis; 4) The block scope that loop is in is a staged-pipeline; 5) The outermost loop outside the reduction block should has the reduction block as its first child block; 6) The outermost reduction loop should have only one child block; 7) An unary extent loop that is not bound to any reduction or data parallel variables in the block binding should not appear under some reduction loop; 8) The reduction block should write to only one buffer, and its init and body are both simple BufferStore`s, and the pattern is registered as an associative reducer. The pre-defined patterns include: plus, multiplication, min and max; 9) Each of the loops on top of the block cannot be bound to a data parallel and a reduction block binding at the same time; 10) `factor_axis should be in range [-ndim(B) - 1, ndim(B)], where B is the buffer that the reduction block writes to. Negative indexing is normalized according to numpy convention.

storage_align(block: tvm.tir.schedule.schedule.BlockRV, buffer_index: int, axis: int, factor: int, offset: int) None

Set alignment requirement for specific dimension such that stride[axis] == k * factor + offset for some k. This is useful to set memory layout for more friendly memory access pattern. For example, we can set alignment to be factor=2, offset=1 to avoid bank conflict for thread access on higher dimension in GPU shared memory.

Parameters
  • block (BlockRV) – The producer block of the buffer.

  • buffer_index (int) – The index of the buffer in block’s write region.

  • axis (int) – The dimension to be specified for alignment.

  • factor (int) – The factor multiple of alignment.

  • offset (int) – The required offset factor.

Examples

Before storage_align, in TensorIR, the IR is:

@T.prim_func
def before_storage_align(a: T.handle, c: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.alloc_buffer((128, 128))
    C = T.match_buffer(c, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B"):
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0
    for i, j in T.grid(128, 128):
        with T.block("C"):
            vi, vj = T.axis.remap("SS", [i, j])
            C[vi, vj] = B[vi, vj] + 1.0

Create the schedule and do storage_align:

sch = tir.Schedule(before_storage_align)
sch.storage_align(sch.get_block("B"), buffer_index=0, axis=0, factor=128, offset=1)
print(sch.mod["main"].script())

After applying rfactor, the IR becomes:

@T.prim_func
def after_storage_align(a: T.handle, c: T.handle) -> None:
    A = T.match_buffer(a, (128, 128))
    B = T.alloc_buffer((128, 128))
    C = T.match_buffer(c, (128, 128))
    for i, j in T.grid(128, 128):
        with T.block("B"):
            T.block_attr({"buffer_dim_align": [[[0, 128, 1]]]})
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj] * 2.0
    for i, j in T.grid(128, 128):
        with T.block("C"):
            vi, vj = T.axis.remap("SS", [i, j])
            C[vi, vj] = B[vi, vj] + 1.0

After lowering passes, buffer B will have strides as [129, 1].

Note

Storage_align requires the buffer to be an intermediate buffer defined via alloc_buffer.

enter_postproc() None

A no-op that marks the start of postprocessing phase of scheduling

exception tvm.tir.ScheduleError

Error that happens during TensorIR scheduling.

tvm.tir.transform

Namespace of all TIR transformations

Functions:

prim_func_pass([pass_func, opt_level, name, …])

Decorate a function pass.

Apply(ftransform)

Apply ftransform to each function in the Module.

BF16CastElimination()

Eliminate verbose casting between fp32 and bf16 Checks if the AST has the pattern: castto32(castto16(some_fp32_op(…))) The verbose casting is generated by BF16Promote for multiple bf16 Ops in a row.

BF16Legalize()

Legalize bf16 typed Ops.

BF16Promote()

Promote bf16 to fp32.

BF16TypeLowering()

Replace all bf16 type with uint16.

CoProcSync()

Detect and insert sync points to co-processor.

CombineContextCall()

Combine context calls in the host function.

CompactBufferAllocation()

Compact the buffer access region.

ConvertBlocksToOpaque()

Substitute all the block vars with the PrimExprs they are bound to, indicated by the corresponding iter_values in BlockRealize, and then convert the blocks into opaque ones by removing all the iter_values in BlockRealize and iter_vars in Block.

ConvertForLoopsToSerial()

Convert Parallel For Loops to Serial For Loops.

DecorateDeviceScope()

Decorate all the function’s body as device function.

Filter(fcond)

Filter functions by the calling convention attribute.

FlattenBuffer()

Flatten the multi-dimensional BufferLoad and BufferStore to single dimensional Load/Store.

HoistIfThenElse([variant])

Hoist loop-invariant IfThenElse nodes to outside the eligible loops.

InferFragment()

Infer the TensorCore fragment infomation using tensor intrinsics.

InjectCopyIntrin(pragma_key, fintrin)

Inject virtual thread loops.

InjectDoubleBuffer()

Inject double buffer statements.

InjectPrefetch()

Inject prefetch instructions into stmt.

InjectVirtualThread()

Inject virtual thread loops.

InstrumentBoundCheckers()

Instruments bound checkers.

LegalizePackedCalls()

Legalize packed calls to have its arguments wrapped in TVMValues

LiftAttrScope(attr_key)

Lift common attrs with attr_key to outer scope.

LoopPartition()

Inject virtual thread loops.

LowerCustomDatatypes()

Lower custom datatypes.

LowerDeviceStorageAccessInfo()

Lower attached storage access information on device.

LowerInitBlock()

Lower block init stmt into IfThenElse statements.

LowerIntrin()

Lower target specific intrinsic calls.

LowerMatchBuffer()

Remove match buffers inside the block.

LowerTVMBuiltin()

Lower tvm builtin intrinsics.

LowerThreadAllreduce()

Lower cross thread alleduce.

LowerWarpMemory()

Lower warp memory access to low-level device related function calls.

MakePackedAPI([num_unpacked_params])

Transform the PrimFuncs in the module to a packed func API.

MakeUnpackedAPI()

Transform the PrimFuncs in the module to a C API compatible with internal calls.

MergeDynamicSharedMemoryAllocations()

This pass merges multiple TIR-level dynamic shared memory allocations into one allocation.

NarrowDataType(target_bits)

Narrow down PrimExpr datatype in stmt to target_bits.

PlanAndUpdateBufferAllocationLocation()

Locate the buffer allocation to the exact position (usually is the lca of buffer access).

RemoveNoOp()

Remove No Op from the Stmt.

RewriteUnsafeSelect()

Detect and rewrite unsafe select that contains memory access.

Simplify()

Run arithmetic simplifications on the statements and expressions.

SkipAssert()

Skip assert stmt.

SplitHostDevice()

Split the function into a host function and device functions.

StorageFlatten(cache_line_size[, …])

Flatten the multi-dimensional read/write to 1D.

StorageRewrite()

Rewrite storage allocation pattern.

TextureFlatten()

Flatten the multi-dimensional read/write to 2D.

ThreadSync(storage_scope)

Insert sync between parallel read/write of shared buffers.

UnifyThreadBinding()

Unify all the thread bindings for “blockIdx.x/y/z”, “threadIdx.x/y/z”, and “vthread.x/y/z”.

UnrollLoop()

Unroll the constant loop marked by unroll.

VectorizeLoop([enable_vectorize])

Lower vectorization loops.

VerifyMemory()

Verify if func contains illegal host side direct memory access.

Classes:

PrimFuncPass

A pass that works on each tvm.tir.PrimFunc() in a module.

tvm.tir.transform.prim_func_pass(pass_func=None, opt_level: Optional[int] = None, name: Optional[str] = None, required: Optional[List[str]] = None) Callable

Decorate a function pass.

This function returns a callback when pass_func is provided. Otherwise, it returns the created function pass using the given optimization function.

Parameters
  • pass_func (Optional[Callable[(tvm.tir.PrimFunc, IRModule, PassContext) -> tvm.tir.PrimFunc]]) – The transformation function or class.

  • opt_level (int) – The optimization level of this module pass.

  • name (Optional[str]) – The name of the function pass. The name could be empty. In this case, the name of the optimization function will be used as the pass name.

  • required (Optional[List[str]]) – The list of passes that the function pass is dependent on.

Returns

create_function_pass – A decorator will be returned if pass_func is not provided, otherwise return the decorated result. The returned decorator has two behaviors depending on the input: A new FunctionPass will be returned when we decorate a pass function. A new FunctionPass class will be returned when we decorate a class type.

Return type

Union[Callable, FunctionPass]

Examples

The following code block decorates a function pass class.

@tvm.tir.transform.prim_func_pass(opt_level=1)
class TestReplaceFunc:
    def __init__(self, new_func):
        self.new_func = new_func

    def transform_function(self, func, mod, ctx):
        # just for demo purposes
        # transform func to new_func
        return self.new_func

The following code creates a function pass by decorating a user defined transform function.

@tvm.tir.transform.prim_func_pass(opt_level=2)
def transform(func, mod, ctx):
    # my transformations here.
    return func

function_pass = transform
assert isinstance(function_pass, transform.FunctionPass)
assert function_pass.info.opt_level == 2

# Given a module m, the optimization could be invoked as the following:
updated_mod = function_pass(m)
# Now constant folding should have been applied to every function in
# the provided module m. And the updated module will be returned.
class tvm.tir.transform.PrimFuncPass

A pass that works on each tvm.tir.PrimFunc() in a module. A function pass class should be created through py:func:tvm.tir.transform.function_pass.

tvm.tir.transform.Apply(ftransform)

Apply ftransform to each function in the Module.

This function is a thin wrapper around tvm.tir.transform.prim_func_pass

Parameters

ftransform (tvm.tir.PrimFunc -> tvm.tir.PrimFunc) – The transformation pass.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.BF16CastElimination()

Eliminate verbose casting between fp32 and bf16 Checks if the AST has the pattern: castto32(castto16(some_fp32_op(…))) The verbose casting is generated by BF16Promote for multiple bf16 Ops in a row. e.g.: X[i] + Y[i] + T[i] => bf16((float32(bf16((float32(X[i]) + float32(Y[i])))) + float32(T[i]))) After this pass: bf16(float32(X[i]) + float32(Y[i]) + float32(T[i]))

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.BF16Legalize()

Legalize bf16 typed Ops. Runs BF16Promote, BF16CastElimination and BF16TypeLowering

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.BF16Promote()

Promote bf16 to fp32. Add a cast to fp32 before Ops, then add a cast back to bf16.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.BF16TypeLowering()

Replace all bf16 type with uint16. Also lower the casting between fp32 and bf16

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.CoProcSync()

Detect and insert sync points to co-processor.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.CombineContextCall()

Combine context calls in the host function.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.CompactBufferAllocation()

Compact the buffer access region. by removing the buffer regions that are not accessed, i.e. narrowing the buffer shape and adjust the access region if necessary.

Example

Before narrowing, B is a [16, 16] buffer, but only a skinny vector B[i, 0:16] is accessed.

for i in range(0, 16):
    with T.block():
        B = T.alloc_buffer(16, 16)
        for j in range(0, 16):
            B[i, j] = A[i, j] + 1
        for j in range(0, 16):
            C[i, j] = B[i, j] + 1

This pass narrows the buffer shape and adjust its accessed region accordingly. In this particular case, because only a 1 * 16 vector of B is accessed, the pass narrows B to shape [1, 16], and changes the access to B[i, j] to B[0, j].

for i in range(0, 16):
    with T.block():
        B = T.alloc_buffer(1, 16)
        for j in range(0, 16):
            B[0, j] = A[i, j] + 1
        for j in range(0, 16):
            C[i, j] = B[0, j] + 1
Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.ConvertBlocksToOpaque()

Substitute all the block vars with the PrimExprs they are bound to, indicated by the corresponding iter_values in BlockRealize, and then convert the blocks into opaque ones by removing all the iter_values in BlockRealize and iter_vars in Block.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.ConvertForLoopsToSerial()

Convert Parallel For Loops to Serial For Loops.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.DecorateDeviceScope()

Decorate all the function’s body as device function.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.Filter(fcond)

Filter functions by the calling convention attribute.

Parameters

fcond (tvm.tir.PrimFunc -> bool) – The condition of the filtering.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.FlattenBuffer()

Flatten the multi-dimensional BufferLoad and BufferStore to single dimensional Load/Store. Also remove Block to ensure that the flattened TIR can not be scheduled again.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.HoistIfThenElse(variant: Optional[str] = None)

Hoist loop-invariant IfThenElse nodes to outside the eligible loops.

Parameters

variant (Optional[String]) –

The variant of the pass. variant can have any one of following values [“basic”, None(Default)].

The basic variant supports basic hoisting scenarios where it expects the For & If Nodes are in place consecutively and does not involve global scope variables or more advanced scenarios.

Default variant supports all hoisting scenarios,i.e., {“Basic” + “Advanced”} supported with control with PassContext configs like below:

config={“tir.HoistIfThenElse”: {“support_block_scope_hosting”: True}}

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.InferFragment()

Infer the TensorCore fragment infomation using tensor intrinsics.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.InjectCopyIntrin(pragma_key: str, fintrin)

Inject virtual thread loops.

Parameters
  • pragma_key (str) – The pragma key for hint of copy.

  • fintrin (function) – The function with signature copyintrin(src, dst, pad_before, pad_after, pad_value)

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.InjectDoubleBuffer()

Inject double buffer statements.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.InjectPrefetch()

Inject prefetch instructions into stmt.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.InjectVirtualThread()

Inject virtual thread loops.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.InstrumentBoundCheckers()

Instruments bound checkers.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.LegalizePackedCalls()

Legalize packed calls to have its arguments wrapped in TVMValues

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.LiftAttrScope(attr_key: str)

Lift common attrs with attr_key to outer scope.

Parameters

attr_key (str) – The attribute key to be checked.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.LoopPartition()

Inject virtual thread loops.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.LowerCustomDatatypes()

Lower custom datatypes.

See tvm::datatypes::Registry for more information on adding custom datatypes.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.LowerDeviceStorageAccessInfo()

Lower attached storage access information on device.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

Note

Run this pass after all storage access analysis finish.

tvm.tir.transform.LowerInitBlock()

Lower block init stmt into IfThenElse statements.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.LowerIntrin()

Lower target specific intrinsic calls.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.LowerMatchBuffer()

Remove match buffers inside the block. Also, it will validate the binding.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.LowerTVMBuiltin()

Lower tvm builtin intrinsics.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.LowerThreadAllreduce()

Lower cross thread alleduce.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.LowerWarpMemory()

Lower warp memory access to low-level device related function calls.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.MakePackedAPI(num_unpacked_params: int = - 1)

Transform the PrimFuncs in the module to a packed func API.

Parameters

num_unpacked_params (int) – Number of parameters that we hope to directly pass via normal arguments following the PackedFunc input signature. If it is specified as -1 or it is less than the number of arguments, the pass will packed arguments still.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.MakeUnpackedAPI()

Transform the PrimFuncs in the module to a C API compatible with internal calls.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.MergeDynamicSharedMemoryAllocations()

This pass merges multiple TIR-level dynamic shared memory allocations into one allocation.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.NarrowDataType(target_bits: int)

Narrow down PrimExpr datatype in stmt to target_bits.

Parameters

target_bits (int) – The target bit configuration.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

Note

Run this pass after StorageFlatten.

tvm.tir.transform.PlanAndUpdateBufferAllocationLocation()

Locate the buffer allocation to the exact position (usually is the lca of buffer access). This pass will inject opaque block with alloc_buffers at the allocation site.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.RemoveNoOp()

Remove No Op from the Stmt.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.RewriteUnsafeSelect()

Detect and rewrite unsafe select that contains memory access.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.Simplify()

Run arithmetic simplifications on the statements and expressions.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.SkipAssert()

Skip assert stmt.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.SplitHostDevice()

Split the function into a host function and device functions.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.StorageFlatten(cache_line_size, create_bound_attribute: bool = False)

Flatten the multi-dimensional read/write to 1D.

Parameters
  • cache_line_size (int) – The size of CPU cache line.

  • create_bound_attribute – Whether to create bound attributes.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.StorageRewrite()

Rewrite storage allocation pattern.

Moves the allocation to outer most possible scope. Trying to share space between allocations to make a static allocation plan when possible.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.TextureFlatten()

Flatten the multi-dimensional read/write to 2D.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.ThreadSync(storage_scope: str)

Insert sync between parallel read/write of shared buffers.

Parameters

storage_scope (str) – The target storage scope.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.UnifyThreadBinding()

Unify all the thread bindings for “blockIdx.x/y/z”, “threadIdx.x/y/z”, and “vthread.x/y/z”. Before the unification, two vars that are bound to a thread axis (e.g., “threadIdx.x”) use different IterVars and variables in their AttrStmts. After the unification, we use a consolidated IterVar and a variable for them.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

Note

vthread is a legacy behavior that will be deprecated, though thread bindings of vthread are still also unified in this pass. Please use vthread.x, vthread.y and vthread.z instead.

tvm.tir.transform.UnrollLoop()

Unroll the constant loop marked by unroll.

This pass also automatically attach pragma unroll tag to loops which meets the standard.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.VectorizeLoop(enable_vectorize: bool = True)

Lower vectorization loops.

Parameters

enable_vectorize (bool) – Whether vectorization is enabled. Will lower to scalar loop when it is turned off.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.transform.VerifyMemory()

Verify if func contains illegal host side direct memory access.

Returns

fpass – The result pass

Return type

tvm.transform.Pass

tvm.tir.analysis

Namespace of all TIR analysis utils.

Classes:

Block(iter_vars, reads, writes, name_hint, …)

Block node.

Buffer

Symbolic data buffer in TVM.

BufferRegion(buffer, region)

BufferRegion node.

Dict(*args, **kwds)

List(*args, **kwds)

PrimExpr

Base class of all primitive expressions.

PrimFunc(params, body[, ret_type, …])

A function declaration expression.

Stmt

Base class of all the statements.

Var(name, dtype, tvm.ir.type.Type], span)

Symbolic variable.

Functions:

calculate_workspace_bytes(func, …)

Calculate the workspace size in bytes needed by the TIR allocates inside the TIR PrimFunc.

detect_buffer_access_lca(func)

Detect the lowest common ancestor(LCA) of buffer access, including both high-level access(BufferLoad, BufferStore) and low-level access(Load, Store and opaque access).

expr_deep_equal(lhs, rhs)

Deeply compare two nested expressions.

get_block_access_region(block, buffer_var_map)

Detect which regions of tensors in this block are read or written to.

get_block_read_write_region(block, …)

Auto detect the block read/write region according to its body stmt.

verify_gpu_code(func, constraints)

Verify if module contains illegal host side direct memory access.

verify_memory(func)

Verify if func contains illegal host side direct memory access.

verify_ssa(func)

Verify if the func is in SSA form.

class tvm.tir.analysis.Block(iter_vars: List[tvm.tir.expr.IterVar], reads: List[tvm.tir.stmt.BufferRegion], writes: List[tvm.tir.stmt.BufferRegion], name_hint: str, body: tvm.tir.stmt.Stmt, init: Optional[tvm.tir.stmt.Stmt] = None, alloc_buffers: Optional[List[tvm.tir.buffer.Buffer]] = None, match_buffers: Optional[List[tvm.tir.stmt.MatchBufferRegion]] = None, annotations: Optional[Mapping[str, tvm.runtime.object.Object]] = None, span: Optional[tvm.ir.base.Span] = None)

Block node.

Parameters
  • iter_vars (List[IterVar]) – The block Variable.

  • reads (List[BufferRegion]) – The read buffer regions of the block.

  • writes (List[BufferRegion]) – The write buffer regions of the block.

  • name_hint (str) – the name_hint of the block.

  • body (Stmt) – The body of the block.

  • init (Optional[Stmt]) – The init block of the reduction block

  • alloc_buffers (Optional[list[Buffer]]) – The buffer allocations

  • match_buffers (Optional[List[MatchBufferRegion]]) – The subregion buffer match

  • annotations (Optional[Mapping[str, Object]]) – Additional annotation hints.

  • span (Optional[Span]) – The location of this block in the source code.

class tvm.tir.analysis.Buffer

Symbolic data buffer in TVM.

Buffer provide a way to represent data layout specialization of data structure in TVM.

Do not construct directly, use decl_buffer() instead. See the documentation of decl_buffer() for more details.

See also

decl_buffer

Declare a buffer

Methods:

access_ptr(access_mask[, ptr_type, …])

Get an access pointer to the head of buffer.

vload(begin[, dtype])

Generate an Expr that loads dtype from begin index.

vstore(begin, value)

Generate a Stmt that store value into begin index.

scope()

Return the storage scope associated with this buffer.

access_ptr(access_mask, ptr_type='handle', content_lanes=1, offset=0)

Get an access pointer to the head of buffer.

This is the recommended method to get buffer data ptress when interacting with external functions.

Parameters
  • access_mask (int) – The access pattern MASK. Indicate whether the access will read or write to the data content.

  • ptr_type (str, optional) – The data type of the result pointer. Do not specify unless we want to cast pointer to specific type.

  • content_lanes (int, optional) – The number of lanes for the data type. This value is greater than one for vector types.

  • offset (Expr, optional) – The offset of pointer. We can use it to offset by the number of elements from the address of ptr.

Examples

# Get access ptr for read
buffer.access_ptr("r")
# Get access ptr for read/write with bitmask
buffer.access_ptr(Buffer.READ | Buffer.WRITE)
# Get access ptr for read/write with str flag
buffer.access_ptr("rw")
# Get access ptr for read with offset
buffer.access_ptr("r", offset = 100)
vload(begin, dtype=None)

Generate an Expr that loads dtype from begin index.

Parameters
  • begin (Array of Expr) – The beginning index in unit of Buffer.dtype

  • dtype (str) – The data type to be loaded, can be vector type which have lanes that is multiple of Buffer.dtype

Returns

load – The corresponding load expression.

Return type

Expr

vstore(begin, value)

Generate a Stmt that store value into begin index.

Parameters
  • begin (Array of Expr) – The beginning index in unit of Buffer.dtype

  • value (Expr) – The value to be stored.

Returns

store – The corresponding store stmt.

Return type

Stmt

scope()

Return the storage scope associated with this buffer. :returns: scope – The storage scope associated with this buffer. :rtype: str

class tvm.tir.analysis.BufferRegion(buffer: tvm.tir.buffer.Buffer, region: List[tvm.ir.expr.Range])

BufferRegion node.

Parameters
  • buffer (Buffer) – The buffer of the buffer region

  • region (List[Range]) – The region array of the buffer region

class tvm.tir.analysis.Dict(*args, **kwds)
class tvm.tir.analysis.List(*args, **kwds)
class tvm.tir.analysis.PrimExpr

Base class of all primitive expressions.

PrimExpr is used in the low-level code optimizations and integer analysis.

class tvm.tir.analysis.PrimFunc(params, body, ret_type=None, buffer_map=None, attrs=None, span=None)

A function declaration expression.

Parameters
  • params (List[Union[tvm.tir.Var, tvm.tir.Buffer]]) – List of input parameters to the function.

  • body (tvm.tir.Stmt) – The body of the function.

  • ret_type (tvm.ir.Type) – The return type annotation of the function.

  • buffer_map (Map[tvm.tir.Var, tvm.tir.Buffer]) – The buffer binding map.

  • attrs (Optional[tvm.Attrs]) – Attributes of the function, can be None

  • span (Optional[Span]) – The location of this itervar in the source code.

Methods:

with_body(new_body[, span])

Create a new PrimFunc with the same set signatures but a new body.

specialize(param_map)

Specialize parameters of PrimFunc

script([tir_prefix, show_meta])

Print IRModule into TVMScript

with_body(new_body, span=None)

Create a new PrimFunc with the same set signatures but a new body.

Parameters
  • new_body (Stmt) – The new body.

  • span (Optional[Span]) – The location of this itervar in the source code.

Returns

new_func – The created new function.

Return type

PrimFunc

specialize(param_map: Mapping[tvm.tir.expr.Var, Union[tvm.ir.expr.PrimExpr, tvm.tir.buffer.Buffer]])

Specialize parameters of PrimFunc

Parameters

param_map (Mapping[Var, Union[PrimExpr, Buffer]]) – The mapping from function params to the instance

Examples

We can define a Meta TIR function with symbolic shape:

@T.prim_func
def mem_copy(a: T.handle, b: T.handle, m: T.int32, n: T.int32) -> None:
    A = T.match_buffer(a, (m, n), "float32")
    B = T.match_buffer(b, (m, n), "float32")

    for i, j in T.grid(m, n):
        with T.block():
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj]

Then we can make it specialized with given shapes or buffers.

a, _, m, n = mem_copy.params
func = mem_copy.specialize({a: tir.decl_buffer((16, 16))})
# or
func = mem_copy.specialize({n: 16, m: 16})

The specialized function:

@T.prim_func
def mem_copy_16_16(a: T.handle, b: T.handle) -> None:
    A = T.match_buffer(a, (16, 16), "float32")
    B = T.match_buffer(b, (16, 16), "float32")

    for i, j in T.grid(16, 16):
        with T.block():
            vi, vj = T.axis.remap("SS", [i, j])
            B[vi, vj] = A[vi, vj]
Returns

func – The new function with parameter specialized

Return type

PrimFunc

script(tir_prefix: str = 'tir', show_meta: bool = False) str

Print IRModule into TVMScript

Parameters
  • tir_prefix (str) – The tir namespace prefix

  • show_meta (bool) – Whether to show meta information

Returns

script – The TVM Script of the PrimFunc

Return type

str

class tvm.tir.analysis.Stmt

Base class of all the statements.

class tvm.tir.analysis.Var(name: str, dtype: Union[str, tvm.ir.type.Type], span: Optional[tvm.ir.base.Span] = None)

Symbolic variable.

Parameters
  • name (str) – The name

  • dtype (Union[str, tvm.irType]) – The data type

  • span (Optional[Span]) – The location of this itervar in the source code.

tvm.tir.analysis.calculate_workspace_bytes(func: tvm.tir.function.PrimFunc, workspace_byte_alignment: int) int

Calculate the workspace size in bytes needed by the TIR allocates inside the TIR PrimFunc.

Parameters
  • func (tvm.tir.PrimFunc) – The function to be detected.

  • workspace_byte_alignment (int) – The byte alignment required for each tensor

Returns

result – Workspace size in bytes.

Return type

int

tvm.tir.analysis.detect_buffer_access_lca(func: tvm.tir.function.PrimFunc) Dict[tvm.tir.buffer.Buffer, tvm.tir.stmt.Stmt]

Detect the lowest common ancestor(LCA) of buffer access, including both high-level access(BufferLoad, BufferStore) and low-level access(Load, Store and opaque access). The LCA may be a For loop or a Block.

Parameters

func (tvm.tir.PrimFunc) – The function to be detected.

Returns

result – Map from buffer to the LCA of all access to it.

Return type

Dict[Buffer, Stmt]

tvm.tir.analysis.expr_deep_equal(lhs: tvm.ir.expr.PrimExpr, rhs: tvm.ir.expr.PrimExpr) bool

Deeply compare two nested expressions.

Parameters
Returns

result – The comparison result

Return type

bool

Note

This function does not remap variable bindings, it will not return true for (let x = 1 in x + 1) vs (let y = 1 in y + 1), unless x.same_as(y). Use py:func:tvm.ir.structural_equal to handle structural variable remapping.

Due to the restriction of not remapping variables, this function can run faster than StructuralEqual and can be used as a utility function during arithmetic simplifications.

Always consider py:func:tvm.ir.structural_equal first, which handles the structural remapping.

tvm.tir.analysis.get_block_access_region(block: tvm.tir.stmt.Block, buffer_var_map: Dict[tvm.tir.expr.Var, tvm.tir.buffer.Buffer]) List[List[tvm.tir.stmt.BufferRegion]]
Detect which regions of tensors in this block are read or written to.

Regions are sorted by order of appearance in the AST.

Parameters
  • block (tvm.tir.Block) – The block in which we are detecting read/write regions.

  • buffer_var_map (Dict[Var, Buffer]) – The outside buffers which may access the block. Mapping from buffer var to the buffer

Returns

result

Array of access regions. There are three arrays of BufferRegion:
  • first: read regions

  • second: write regions

  • third: opaque regions

Return type

List[List[BufferRegion]]

tvm.tir.analysis.get_block_read_write_region(block: tvm.tir.stmt.Block, buffer_var_map: Dict[tvm.tir.expr.Var, tvm.tir.buffer.Buffer]) List[List[tvm.tir.stmt.BufferRegion]]
Auto detect the block read/write region according to its body stmt.

An opaque access will be counted as both a read and a write access

Parameters
  • block (tvm.tir.Block) – The block in which we are detecting read/write regions.

  • buffer_var_map (Dict[Var, Buffer]) – The outside buffers which may access the block. Mapping from buffer var to the buffer

Returns

result – An array only consisting of the read regions and write regions of the input block

Return type

List[List[BufferRegion]]

tvm.tir.analysis.verify_gpu_code(func: tvm.tir.function.PrimFunc, constraints: Dict[str, int]) None

Verify if module contains illegal host side direct memory access.

Parameters
Returns

result – The result of verification.

Return type

bool

tvm.tir.analysis.verify_memory(func: tvm.tir.function.PrimFunc) bool

Verify if func contains illegal host side direct memory access.

Parameters

func (tvm.tir.PrimFunc) – The module to be verified.

Returns

result – The result of verification.

Return type

bool

tvm.tir.analysis.verify_ssa(func: tvm.tir.function.PrimFunc) bool

Verify if the func is in SSA form.

Parameters

func (tvm.tir.PrimFunc) – The module to be verified.

Returns

result – The result of verification.

Return type

bool

tvm.tir.stmt_functor

Statement functor utilities for IR transformations

Functions:

ir_transform(stmt, preorder, postorder[, …])

Recursively visit and transform ir nodes in post DFS order.

post_order_visit(stmt, fvisit)

Recursively visit the ir in post DFS order node, apply fvisit

substitute(node, vmap)

Substitute the var specified by vmap.

tvm.tir.stmt_functor.ir_transform(stmt, preorder, postorder, only_enable=None)

Recursively visit and transform ir nodes in post DFS order.

Parameters
  • stmt (tvm.tir.Stmt) – The input to be transformed.

  • preorder (function) – The function called in before recursive mutation If preorder returns None, then the transform will proceed to recursive call. If preorder returns a not None tvm.tir.Stmt/Expr, the transformer will simply return it and won’t do further recursion.

  • postorder (function) – The function called after recursive mutation.

  • only_enable (Optional[List[str]]) – List of types that we only enable.

Returns

result – The result.

Return type

tvm.tir.Stmt

tvm.tir.stmt_functor.post_order_visit(stmt, fvisit)
Recursively visit the ir in post DFS order node, apply fvisit

Each node is guaranteed to be visited only once.

Parameters

fvisit (function) – The visitor function.

tvm.tir.stmt_functor.substitute(node, vmap)

Substitute the var specified by vmap.

Parameters
  • node (ObjectRef) – The input.

  • vmap (Dict[Var, PrimExpr]) – The variable mapping.

Returns

result – The result.

Return type

tvm.tir.Stmt