tvm.tir.transform¶
Namespace of all TIR transformations
- tvm.tir.transform.prim_func_pass(pass_func=None, opt_level: Optional[int] = None, name: Optional[str] = None, required: Optional[List[str]] = None, traceable=False) Union[Callable, tvm.tir.transform.function_pass.PrimFuncPass] ¶
Decorate a function pass.
This function returns a callback when pass_func is provided. Otherwise, it returns the created function pass using the given optimization function.
- Parameters
pass_func (Optional[Callable[(tvm.tir.PrimFunc, IRModule, PassContext) -> tvm.tir.PrimFunc]]) – The transformation function or class.
opt_level (int) – The optimization level of this module pass.
name (Optional[str]) – The name of the function pass. The name could be empty. In this case, the name of the optimization function will be used as the pass name.
required (Optional[List[str]]) – The list of passes that the function pass is dependent on.
- Returns
create_function_pass – A decorator will be returned if pass_func is not provided, otherwise return the decorated result. The returned decorator has two behaviors depending on the input: A new FunctionPass will be returned when we decorate a pass function. A new FunctionPass class will be returned when we decorate a class type.
- Return type
Union[Callable, FunctionPass]
Examples
The following code block decorates a function pass class.
@tvm.tir.transform.prim_func_pass(opt_level=1) class TestReplaceFunc: def __init__(self, new_func): self.new_func = new_func def transform_function(self, func, mod, ctx): # just for demo purposes # transform func to new_func return self.new_func
The following code creates a function pass by decorating a user defined transform function.
@tvm.tir.transform.prim_func_pass(opt_level=2) def transform(func, mod, ctx): # my transformations here. return func function_pass = transform assert isinstance(function_pass, transform.FunctionPass) assert function_pass.info.opt_level == 2 # Given a module m, the optimization could be invoked as the following: updated_mod = function_pass(m) # Now constant folding should have been applied to every function in # the provided module m. And the updated module will be returned.
- class tvm.tir.transform.PrimFuncPass¶
A pass that works on each
tvm.tir.PrimFunc()
in a module. A function pass class should be created through py:func:tvm.tir.transform.function_pass.
- tvm.tir.transform.AnnotateDeviceRegions()¶
Annotate locations that should be run on the device
Insert AttrStmt nodes specifying a target on which regions within the PrimFunc should be executed. Only modifies functions that have a tvm::attr::kTarget attribute, and where that target defines a host.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.AnnotateEntryFunc()¶
Set a PrimFunc as the entry point if it is only function in IRModule.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.Apply(ftransform)¶
Apply ftransform to each function in the Module.
This function is a thin wrapper around tvm.tir.transform.prim_func_pass
- Parameters
ftransform (tvm.tir.PrimFunc -> tvm.tir.PrimFunc) – The transformation pass.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.ApplyLayoutTransforms()¶
Reshape buffers that appear in the “layout_transform_map” fucntion attribute.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.BF16ComputeLegalize()¶
Legalize bf16 compute Ops.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.BF16StorageLegalize()¶
Legalize bf16 storage types to u16.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.BindTarget(target)¶
Annotate a PrimFunc with a given target. :param target: target :type target: tvm.target.Target
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.CoProcSync()¶
Detect and insert sync points to co-processor.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.CombineContextCall()¶
Combine context calls in the host function.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.CommonSubexprElimTIR(enable_cse_tir: bool = True, identify_equiv_terms: bool = False)¶
Replace redundant computations by new variables.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.CompactBufferAllocation(is_strict: bool = True)¶
Compact the buffer access region. by removing the buffer regions that are not accessed, i.e. narrowing the buffer shape and adjust the access region if necessary.
Example
Before narrowing,
B
is a[16, 16]
buffer, but only a skinny vectorB[i, 0:16]
is accessed.for i in range(0, 16): with T.block(): B = T.alloc_buffer(16, 16) for j in range(0, 16): B[i, j] = A[i, j] + 1 for j in range(0, 16): C[i, j] = B[i, j] + 1
This pass narrows the buffer shape and adjust its accessed region accordingly. In this particular case, because only a
1 * 16
vector ofB
is accessed, the pass narrowsB
to shape[1, 16]
, and changes the access toB[i, j]
toB[0, j]
.for i in range(0, 16): with T.block(): B = T.alloc_buffer(1, 16) for j in range(0, 16): B[0, j] = A[i, j] + 1 for j in range(0, 16): C[i, j] = B[0, j] + 1
- Parameters
is_strict (bool) – Ensure the compacted shape to be always smaller than the original shape. Otherwise it allows to grow the shape to match actual accessed buffer regions.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.ConvertBlocksToOpaque()¶
Substitute all the block vars with the PrimExprs they are bound to, indicated by the corresponding iter_values in BlockRealize, and then convert the blocks into opaque ones by removing all the iter_values in BlockRealize and iter_vars in Block.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.ConvertForLoopsToSerial()¶
Convert Parallel For Loops to Serial For Loops.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.ConvertSSA()¶
Convert an IRModule to be SSA form.
This pass handles cases where the same tir.Var appears in multiple functions within the same module. For example, after extracting a fragment from one function into another, where the same tir.Var may be defined both as within the body of the original function, and as a parameter within the hoisted function.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.DecorateDeviceScope()¶
Decorate all the function’s body as device function.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.DefaultGPUSchedule()¶
The pass sets default thread bindings for PrimFuncs, including symbolic shape functions, allowing their build and execution on GPU devices. It examines all the blocks within the PrimFunc and conducts loop fusion, splitting, and reordering operation based on the loop extent and target information, such as the maximum thread block number and maximum thread per block.
The primary objective of this pass is not to optimize performance, but rather to generate a valid GPU kernel for unscheduled or symbolic shape PrimFuncs. The pass is currently only working for CUDA targets.
- Returns
ret
- Return type
- tvm.tir.transform.ExtractPrimFuncConstants()¶
Collects and unificates tir non-scalar constants to module’s attr ‘Constants’ array.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.FP8ComputeLegalize(promote_dtype_str: str = 'float32')¶
Legalize fp8 compute Ops.
- Parameters
promote_dtype (str) – The data type we promote fp8 to, options: float16/float32.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.FP8StorageLegalize()¶
Legalize fp8 storage types to u8.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.Filter(fcond: Callable)¶
Filter out PrimFuncs that does not satisfy the given condition. fcond should be a function that takes a primfunc and returns boolean.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.FlattenBuffer()¶
Flatten the multi-dimensional BufferLoad and BufferStore to single dimensional BufferLoad/BufferStore for the TIR not contains opaque block.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.ForceNarrowIndexToInt32()¶
Force narrow down indexing expressions and integer buffers to int32 dtype.
- Returns
fpass – The result pass
- Return type
Note
This pass should not be used in default cases.
- tvm.tir.transform.HoistExpression()¶
Generalized verison of HoistIfThenElse.
Hoist loop-invariant expressions to outside the eligible loops. Searches for expressions in:
LetStmt bindings
IfThenElse conditions
Boolean operators
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.HoistIfThenElse(variant: Optional[str] = None)¶
Hoist loop-invariant IfThenElse nodes to outside the eligible loops.
- Parameters
variant (Optional[String]) –
The variant of the pass. variant can have any one of following values [“basic”, None(Default)].
The basic variant supports basic hoisting scenarios where it expects the For & If Nodes are in place consecutively and does not involve global scope variables or more advanced scenarios.
Default variant supports all hoisting scenarios,i.e., {“Basic” + “Advanced”} supported with control with PassContext configs like below:
config={“tir.HoistIfThenElse”: {“support_block_scope_hosting”: True}}
- Returns
fpass – The result pass
- Return type
- class tvm.tir.transform.HoistedConditionals(value)¶
Flags for use in HoistExpressionConfig.conditional_types
Each bitflag represents a type of expression that should be hoisted to the outermost loop possible.
- Never = 0¶
No hoisting of conditionals
- IfElseStmt = 1¶
If set, look for hoist candidates in IfElseStmt
- IfElseExpr = 2¶
If set, look for hoist candidates in tir.if_then_else
- BooleanExpression = 4¶
If set, look for hoist candidates in all boolean expressions
- UsingBlockVar = 8¶
If set, allow hoisting of conditionals that use a block variable (e.g. threadIdx.x)
- All = 15¶
Enable all hoisting of conditionals
- class tvm.tir.transform.HoistedLetBindings(value)¶
Flags for use in HoistExpressionConfig.let_binding_types
Each bitflag represents a type of let binding expression that should be hoisted to the outermost loop possible.
- Never = 0¶
No hoisting of let bindings
- RequiredByConditional = 1¶
Bindings that are used by a hoisted conditional
- LetStmt = 2¶
Bindings occuring in LetStmt
- LetExpr = 4¶
Bindings occuring in Let expressions
- All = 7¶
Enable all hoisting of let bindings
- tvm.tir.transform.InferFragment()¶
Infer the TensorCore fragment infomation using tensor intrinsics.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.InjectCopyIntrin(pragma_key: str, fintrin)¶
Inject virtual thread loops.
- Parameters
pragma_key (str) – The pragma key for hint of copy.
fintrin (function) – The function with signature copyintrin(src, dst, pad_before, pad_after, pad_value)
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.InjectDoubleBuffer()¶
Inject double buffer statements.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.InjectPTXAsyncCopy()¶
Rewrite global to shared memory copy on CUDA with asyncronous copy.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.InjectPermutedLayout()¶
Inject permuted layout in mma
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.InjectPrefetch()¶
Inject prefetch instructions into stmt.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.InjectRollingBuffer()¶
Inject rolling buffer statements.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.InjectSoftwarePipeline()¶
Transform annotated loops into pipelined one that parallelize producers and consumers
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.InjectVirtualThread()¶
Inject virtual thread loops.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.InlinePrivateFunctions()¶
Inline calls to private functions
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.InstallDebugSpans()¶
Add line information from the TIR printer as spans on each statement and expression.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.InstrumentBoundCheckers()¶
Instruments bound checkers.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.InstrumentProfileIntrinsics()¶
Insert intrinsic calls to instrument function and loop level profiling.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LegalizePackedCalls()¶
Legalize packed calls to have its arguments wrapped in TVMValues
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LiftAttrScope(attr_key: str)¶
Lift common attrs with attr_key to outer scope.
- Parameters
attr_key (str) – The attribute key to be checked.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LiftThreadBinding()¶
Lift the same thread bindings to their LCA loops.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LoopPartition()¶
Inject virtual thread loops.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LowerAutoCopy()¶
Automatically do memory optimizations for auto copy blocks
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LowerCrossThreadReduction()¶
Lower cross-thread reduction from thread bindings to intrinsic function calls.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LowerCustomDatatypes()¶
Lower custom datatypes.
See tvm::datatypes::Registry for more information on adding custom datatypes.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LowerDeviceKernelLaunch()¶
Lower cross-device function calls.
Prior to this pass, host to device calls are represented as subroutine calls, with environment parameters (e.g. env_thread) specified internally. The device function is an internal function, without a tvm::attr::kGlobalSymbol attribute.
After this pass, host to device calls are represented as tvm_call_packed built-in. The device function is an externally-exposed function, with a non-empty tvm::attr::kGlobalSymbol attribute.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LowerDeviceStorageAccessInfo()¶
Lower attached storage access information on device.
- Returns
fpass – The result pass
- Return type
Note
Run this pass after all storage access analysis finish.
- tvm.tir.transform.LowerInitBlock()¶
Lower block init stmt into IfThenElse statements.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LowerIntrin()¶
Lower target specific intrinsic calls.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LowerMatchBuffer()¶
Remove match buffers inside the block. Also, it will validate the binding.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LowerOpaqueBlock()¶
Remove the block to ensure that the TIR can not be scheduled again.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LowerTVMBuiltin()¶
Lower tvm builtin intrinsics.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LowerThreadAllreduce()¶
Lower cross thread alleduce.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.LowerWarpMemory()¶
Lower warp memory access to low-level device related function calls.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.MakePackedAPI()¶
Transform the PrimFuncs in the module to a packed func API.
Prior to this pass, the PrimFunc may have Buffer arguments defined in the PrimFuncNode::buffer_map. This pass consumes the buffer_map, using it to generate TVMArgs and TVMRetValue* arguments that implement the PackedFunc API.
For static shapes, the BufferNode::shape, BufferNode::strides, and BufferNode::elem_offset member variables are used to generate runtime checks on the corresponding member variables in the user-provided DLTensor* or tvm.nd.array argument. (e.g. A PrimFunc that accepts a buffer of shape [16,32] validates that the DLTensor::shape array is [16,32].)
For dynamic Buffers, in which one or more of these BufferNode member variables use tir.Var that are not defined by other PrimFunc parameters, these are instead used to define the variables based on the corresponding DLTensor members. (e.g. A PrimFunc that accepts a buffer of shape [tir.Var(“n”), tir.Var(“m”)], when passed a DLTensor of shape [16,32], will define n = 16 and n=32, based on the argument’s shape.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.MakeUnpackedAPI()¶
Transform the PrimFuncs in the module to a C API compatible with internal calls.
Prior to this pass, the PrimFunc may have Buffer arguments defined in the PrimFuncNode::buffer_map. This pass consumes the buffer_map, using it to generate T* arguments (e.g. float32*) that can be directly called by a C API.
For static shapes, no runtime validation is performed to confirm that the argument buffer’s shape matches the expected shape. For dynamic shapes, MakeUnpackedAPI requires that the dynamic parameters be passed as separate tir.Var parameters.
- Returns
fpass – The result pass
- Return type
Add the explicit local stage for the shared memory access on GPU.
- Returns
fpass – The result pass
- Return type
This pass merges multiple TIR-level shared memory allocations into one allocation.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.NarrowDataType(target_bits: int)¶
Narrow down PrimExpr datatype in stmt to target_bits.
- Parameters
target_bits (int) – The target bit configuration.
- Returns
fpass – The result pass
- Return type
Note
Run this pass after StorageFlatten.
- tvm.tir.transform.PlanAndUpdateBufferAllocationLocation()¶
Locate the buffer allocation to the exact position (usually is the lca of buffer access). This pass will inject opaque block with alloc_buffers at the allocation site.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.PointerValueTypeRewrite()¶
Rewrite the pointer content type of arguments, as well as Alloc internal to the function to use the most frequently accessed type for load/store to avoid pointer casting in backend when possible.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.ReduceBranchingThroughOvercompute()¶
Reduce branching by introducing overcompute
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.RemoveAssume()¶
Remove all instances of builtin::assume
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.RemoveNoOp()¶
Remove No Op from the Stmt.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.RemoveStoreUndef()¶
Remove stores of undefined values from the Stmt.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.RemoveWeightLayoutRewriteBlock(skip_ndarray_rewrite=False)¶
Remove weight layout rewrite block before benchmarking during tuning stage.
- Parameters
skip_ndarray_rewrite (bool) –
If True, exact rewrite of NDArray, according to the given index map, will be skipped. Only the shape of the NDArray is transformed correctly, and the content of the destination array will be filled with random values.
When this pass is called many times during MetaSchedule tuning, the raw data of NDArray, before and after rewrite, does not matter. Since NDArray layout rewrite, using IndexMap’s MapNDArray, is currently slow, skipping the exact rewrite is sometimes necessary.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.RenormalizeSplitPattern()¶
Renormalize the split pattern from floordiv(floormod()) to floormod(floordiv())
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.RewriteUnsafeSelect()¶
Detect and rewrite unsafe select that contains memory access.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.Simplify()¶
Run arithmetic simplifications on the statements and expressions.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.SkipAssert()¶
Skip assert stmt.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.SplitHostDevice()¶
Split the function into a host function and device functions.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.StorageFlatten(cache_line_size, create_bound_attribute: bool = False)¶
Flatten the multi-dimensional read/write to 1D.
- Parameters
cache_line_size (int) – The size of CPU cache line.
create_bound_attribute – Whether to create bound attributes.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.StorageRewrite()¶
Rewrite storage allocation pattern.
Moves the allocation to outer most possible scope. Trying to share space between allocations to make a static allocation plan when possible.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.TextureFlatten()¶
Flatten the multi-dimensional read/write to 2D.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.ThreadSync(storage_scope: str)¶
Insert sync between parallel read/write of shared buffers.
- Parameters
storage_scope (str) – The target storage scope.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.TransformMmaBufferLayout()¶
Transform mma buffer layout
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.UnifyThreadBinding()¶
Unify all the thread bindings for “blockIdx.x/y/z”, “threadIdx.x/y/z”, and “vthread.x/y/z”. Before the unification, two vars that are bound to a thread axis (e.g., “threadIdx.x”) use different IterVars and variables in their AttrStmts. After the unification, we use a consolidated IterVar and a variable for them.
- Returns
fpass – The result pass
- Return type
Note
vthread is a legacy behavior that will be deprecated, though thread bindings of vthread are still also unified in this pass. Please use vthread.x, vthread.y and vthread.z instead.
- tvm.tir.transform.UnrollLoop()¶
Unroll the constant loop marked by unroll.
This pass also automatically attach pragma unroll tag to loops which meets the standard.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.UseAssumeToReduceBranches()¶
This pass attempts to eliminates layout specific pad branch by overcomputing the values for padded region. Eliminating the branch will help to vectorize code, and improve element wise ops performance.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.VectorizeLoop(enable_vectorize: bool = True)¶
Lower vectorization loops.
- Parameters
enable_vectorize (bool) – Whether vectorization is enabled. Will lower to scalar loop when it is turned off.
- Returns
fpass – The result pass
- Return type
- tvm.tir.transform.VerifyMemory()¶
Verify if func contains illegal host side direct memory access.
- Returns
fpass – The result pass
- Return type