tvm
Functions
tvm::tir::transform Namespace Reference

Functions

Pass VerifySSA ()
 Pass variant of VerifySSA. More...
 
Pass VerifyMemory ()
 Pass variant of VerifyMemory. More...
 
Pass VerifyGPUCode (ffi::Map< ffi::String, PrimExpr > constraints)
 Pass variant of VerifyGPUCode. More...
 
Pass VerifyVTCMLimit (ffi::Optional< Target > target=std::nullopt)
 Pass to checks if the size of the allocated vtcm memory satisfies the limit. More...
 
Pass OOBChecker ()
 Statically check TIR code for out of bounds array access. More...
 
Pass CreatePrimFuncPass (std::function< PrimFunc(PrimFunc, IRModule, PassContext)> pass_func, int opt_level, ffi::String name, tvm::ffi::Array< ffi::String > required, bool traceable=false)
 
Pass VectorizeLoop (bool enable_vectorize=true)
 Lower vectorization loops. More...
 
Pass StorageRewrite ()
 Rewrite storage allocation pattern. Moves the allocation to outer most possible scope. Trying to share space between allocations to make a static allocation plan when possible. More...
 
Pass UnrollLoop ()
 unroll the constant loop marked by unroll. This pass also automatically attach pragma unroll tag to loops which meets the standard. More...
 
Pass RemoveNoOp ()
 Remove No Op from the Stmt. More...
 
Pass RewriteUnsafeSelect ()
 Detect and rewrite unsafe select that contains memory access. More...
 
Pass Simplify ()
 Run arithmetic simplifications on the statements and expressions. More...
 
Pass ConvertSSA ()
 Convert an IRModule to be SSA form. More...
 
Pass InstrumentBoundCheckers ()
 Instruments bound checkers. More...
 
Pass MakePackedAPI ()
 Transform the high-level PrimFunc to a low-level version that can be used as an API function. More...
 
Pass MakeUnpackedAPI ()
 Transform the high-level PrimFunc to a C signature that can be used to call the operator directly. More...
 
Pass RemapThreadAxis (ffi::Map< ffi::String, IterVar > axis_map)
 Remap the thread axis. More...
 
Pass LowerCustomDatatypes ()
 Lower custom datatypes. More...
 
Pass DecorateDeviceScope ()
 Decorate all the function's body as device function. More...
 
Pass AnnotateDeviceRegions ()
 Annotate locations that should be run on the device. More...
 
Pass SplitHostDevice ()
 Split the function into a host function and device functions. More...
 
Pass LowerDeviceKernelLaunch ()
 Lower cross-device function calls. More...
 
Pass SkipAssert ()
 skip assert stmt. More...
 
Pass ThreadSync (ffi::String storage_scope)
 Insert sync between parallel read/write of shared buffers. More...
 
Pass LowerThreadAllreduce ()
 Lower cross thread alleduce. More...
 
Pass InferFragment ()
 Infer the TensorCore fragment infomation using tensor intrinsics. More...
 
Pass LowerTVMBuiltin ()
 Lower builtin intrinsics. More...
 
Pass LowerIntrin ()
 Lower the target specific function intrinsics in each of the function. More...
 
Pass LowerWarpMemory ()
 Lower warp memory access to low-level device related function calls. More...
 
Pass LowerDeviceStorageAccessInfo ()
 Lower attached storage access information on device. More...
 
Pass CombineContextCall ()
 Combine context calls in the host function. More...
 
Pass NarrowDataType (int target_bits)
 Narrow down PrimExpr datatype in stmt to target_bits. More...
 
Pass ForceNarrowIndexToInt32 ()
 Force to narrow down indexing expressions and integer buffers to int32 dtype. More...
 
Pass BF16ComputeLegalize ()
 Legalize bf16 compute Ops. Add a cast to fp32 before Ops, then add a cast back to bf16. More...
 
Pass FP8ComputeLegalize (ffi::String promote_dtype="float16")
 Legalize fp8 compute Ops. Add a cast to fp16/fp32 before Ops, then add a cast back to fp8. More...
 
Pass BF16StorageLegalize ()
 Legalize bf16 storage types to u16. More...
 
Pass FP8StorageLegalize ()
 Legalize fp8 storage types to u8. More...
 
Pass InlinePrivateFunctions ()
 Inline calls to private functions. More...
 
Pass PointerValueTypeRewrite ()
 Rewrite the pointer content type of arguments, as well as Alloc internal to the function to use the most frequently accessed type for load/store to avoid pointer casting in backend when possible. More...
 
Pass HoistIfThenElse ()
 Hoist loop-invariant IfThenElse nodes to outside the elligible loops. More...
 
Pass HoistExpression ()
 Hoist loop-invariant expressions nodes to outside the elligible loops. More...
 
Pass FlattenBuffer ()
 Flatten the multi-dimensional BufferLoad and BufferStore to single dimensional BufferLoad/BufferStore for the TIR not contains opaque block. More...
 
Pass LowerVtcmAlloc ()
 
Pass LowerAsyncDMA ()
 Lower Async TIR primitives to DMA copy and wait builtins. More...
 
Pass CommonSubexprElimTIR (bool enable_cse_tir=true, bool identify_equiv_terms=false)
 Implements a Common Subexpression Elimination (CSE) for TIR which introduces let-in bindings for duplicated sub-expressions. More...
 
Pass MergeSharedMemoryAllocations ()
 
Pass ConvertForLoopsToSerial ()
 This pass is post-scheduling pass to convert all Parallel For loops to Serial ones. This is run to attain lesser memory and/or executor/backend does not support parallel launch of For loops. More...
 
Pass UnifiedStaticMemoryPlanner ()
 This is the unified static memory planner pass that will plan for memory intra- and inter- PrimFuncs together. The pass requires all the function to be PrimFuncs including the main. More...
 
Pass BindParams (const ffi::Array< runtime::Tensor > &constants)
 
Pass ExtractPrimFuncConstants ()
 Pass to collect tir non-scalar constants into module's 'Constants' attribute. More...
 
Pass RenormalizeSplitPattern ()
 Renormalize the split pattern from floordiv(floormod()) to floormod(floordiv()) More...
 
Pass BindTarget (Target target)
 Annotate a PrimFunc with a given target. More...
 
Pass AnnotateEntryFunc ()
 Set a PrimFunc as the entry point if it is only function in IRModule. More...
 
Pass Filter (ffi::TypedFunction< bool(PrimFunc)> fcond)
 Filter PrimFuncs with a given condition. More...
 
Pass InjectPTXAsyncCopy ()
 Pass to rewrite global to shared memory copy on CUDA with asyncronous copy. More...
 
Pass InjectPTXLDG32 (bool enable_ptx_ldg32=true)
 Pass to rewrite global to local memory copy on CUDA with ldg32 instruction. More...
 
Pass RemoveWeightLayoutRewriteBlock (bool skip_tensor_rewrite=false)
 Remove the weight layout rewrite block. More...
 
Pass InstrumentProfileIntrinsics ()
 Insert intrinsic calls to instrument function and loop level profiling. More...
 
Pass DefaultGPUSchedule ()
 The pass sets default thread bindings for PrimFuncs, including symbolic shape functions, allowing their build and execution on GPU devices. It examines all the blocks within the PrimFunc and conducts loop fusion, splitting, and reordering operations based on the loop extent and target information, such as the maximum thread block number and maximum thread per block. More...
 
Pass UseAssumeToReduceBranches ()
 This pass analyzes primfunc & eliminates branch introdued due to layout specific padding. It leverages from the buffer assumptions and use the information to eliminate the branch. More...
 

Function Documentation

◆ AnnotateDeviceRegions()

Pass tvm::tir::transform::AnnotateDeviceRegions ( )

Annotate locations that should be run on the device.

Insert AttrStmt nodes specifying a target on which regions within the PrimFunc should be executed. Only modifies functions that have a tvm::attr::kTarget attribute, and where that target defines a host.

Returns
The pass.

◆ AnnotateEntryFunc()

Pass tvm::tir::transform::AnnotateEntryFunc ( )

Set a PrimFunc as the entry point if it is only function in IRModule.

Returns
The pass.

◆ BF16ComputeLegalize()

Pass tvm::tir::transform::BF16ComputeLegalize ( )

Legalize bf16 compute Ops. Add a cast to fp32 before Ops, then add a cast back to bf16.

Returns
The pass.

◆ BF16StorageLegalize()

Pass tvm::tir::transform::BF16StorageLegalize ( )

Legalize bf16 storage types to u16.

Returns
The pass.

◆ BindParams()

Pass tvm::tir::transform::BindParams ( const ffi::Array< runtime::Tensor > &  constants)

◆ BindTarget()

Pass tvm::tir::transform::BindTarget ( Target  target)

Annotate a PrimFunc with a given target.

Returns
The pass.

◆ CombineContextCall()

Pass tvm::tir::transform::CombineContextCall ( )

Combine context calls in the host function.

Returns
The pass.

◆ CommonSubexprElimTIR()

Pass tvm::tir::transform::CommonSubexprElimTIR ( bool  enable_cse_tir = true,
bool  identify_equiv_terms = false 
)

Implements a Common Subexpression Elimination (CSE) for TIR which introduces let-in bindings for duplicated sub-expressions.

Parameters
enable_cse_tirWhether common subexpression elimination is enabled.
identify_equiv_termsWhether equivalent terms should be identified.
Returns
The pass.

◆ ConvertForLoopsToSerial()

Pass tvm::tir::transform::ConvertForLoopsToSerial ( )

This pass is post-scheduling pass to convert all Parallel For loops to Serial ones. This is run to attain lesser memory and/or executor/backend does not support parallel launch of For loops.

Returns
The pass.

◆ ConvertSSA()

Pass tvm::tir::transform::ConvertSSA ( )

Convert an IRModule to be SSA form.

This pass handles cases where the same tir::Var appears in multiple functions within the same module. For example, after extracting a fragment from one function into another, where the same tir::Var may be defined both as within the body of the original function, and as a parameter within the hoisted function.

Returns
The pass.

◆ CreatePrimFuncPass()

Pass tvm::tir::transform::CreatePrimFuncPass ( std::function< PrimFunc(PrimFunc, IRModule, PassContext)>  pass_func,
int  opt_level,
ffi::String  name,
tvm::ffi::Array< ffi::String >  required,
bool  traceable = false 
)

◆ DecorateDeviceScope()

Pass tvm::tir::transform::DecorateDeviceScope ( )

Decorate all the function's body as device function.

Returns
The pass.

◆ DefaultGPUSchedule()

Pass tvm::tir::transform::DefaultGPUSchedule ( )

The pass sets default thread bindings for PrimFuncs, including symbolic shape functions, allowing their build and execution on GPU devices. It examines all the blocks within the PrimFunc and conducts loop fusion, splitting, and reordering operations based on the loop extent and target information, such as the maximum thread block number and maximum thread per block.

Note
The primary objective of this pass is not to optimize performance, but rather to generate a valid GPU kernel for unscheduled or symbolic shape PrimFuncs. The pass is currently only working for CUDA targets.
Returns
The Pass.

◆ ExtractPrimFuncConstants()

Pass tvm::tir::transform::ExtractPrimFuncConstants ( )

Pass to collect tir non-scalar constants into module's 'Constants' attribute.

Returns
The pass.

◆ Filter()

Pass tvm::tir::transform::Filter ( ffi::TypedFunction< bool(PrimFunc)>  fcond)

Filter PrimFuncs with a given condition.

Returns
The pass.

◆ FlattenBuffer()

Pass tvm::tir::transform::FlattenBuffer ( )

Flatten the multi-dimensional BufferLoad and BufferStore to single dimensional BufferLoad/BufferStore for the TIR not contains opaque block.

Returns
The pass.

◆ ForceNarrowIndexToInt32()

Pass tvm::tir::transform::ForceNarrowIndexToInt32 ( )

Force to narrow down indexing expressions and integer buffers to int32 dtype.

Returns
The pass.
Note
This pass should not be used in default cases.

◆ FP8ComputeLegalize()

Pass tvm::tir::transform::FP8ComputeLegalize ( ffi::String  promote_dtype = "float16")

Legalize fp8 compute Ops. Add a cast to fp16/fp32 before Ops, then add a cast back to fp8.

Parameters
promote_dtypeThe data type used for type promotion, defaults to float16
Note
Must be run after BindTarget, as it relies on target attributes for PrimFuncs
Returns
The pass.

◆ FP8StorageLegalize()

Pass tvm::tir::transform::FP8StorageLegalize ( )

Legalize fp8 storage types to u8.

Note
Must be run after BindTarget, as it relies on target attributes for PrimFuncs
Returns
The pass.

◆ HoistExpression()

Pass tvm::tir::transform::HoistExpression ( )

Hoist loop-invariant expressions nodes to outside the elligible loops.

Can hoist conditionals used in IfThenElse statements and expressions, bindings of variables in Let statements and expressions, or boolean expressions, configurable to enable/disable each hoistable type.

Returns
The pass.

◆ HoistIfThenElse()

Pass tvm::tir::transform::HoistIfThenElse ( )

Hoist loop-invariant IfThenElse nodes to outside the elligible loops.

Returns
The pass.

◆ InferFragment()

Pass tvm::tir::transform::InferFragment ( )

Infer the TensorCore fragment infomation using tensor intrinsics.

Returns
The pass.

◆ InjectPTXAsyncCopy()

Pass tvm::tir::transform::InjectPTXAsyncCopy ( )

Pass to rewrite global to shared memory copy on CUDA with asyncronous copy.

Returns
The pass.

◆ InjectPTXLDG32()

Pass tvm::tir::transform::InjectPTXLDG32 ( bool  enable_ptx_ldg32 = true)

Pass to rewrite global to local memory copy on CUDA with ldg32 instruction.

Returns
The pass.

◆ InlinePrivateFunctions()

Pass tvm::tir::transform::InlinePrivateFunctions ( )

Inline calls to private functions.

Returns
The pass.

◆ InstrumentBoundCheckers()

Pass tvm::tir::transform::InstrumentBoundCheckers ( )

Instruments bound checkers.

Returns
The pass.

◆ InstrumentProfileIntrinsics()

Pass tvm::tir::transform::InstrumentProfileIntrinsics ( )

Insert intrinsic calls to instrument function and loop level profiling.

Returns
The pass.

◆ LowerAsyncDMA()

Pass tvm::tir::transform::LowerAsyncDMA ( )

Lower Async TIR primitives to DMA copy and wait builtins.

◆ LowerCustomDatatypes()

Pass tvm::tir::transform::LowerCustomDatatypes ( )

Lower custom datatypes.

See tvm::datatypes::Registry for more information on adding custom datatypes.

Returns
The pass.

◆ LowerDeviceKernelLaunch()

Pass tvm::tir::transform::LowerDeviceKernelLaunch ( )

Lower cross-device function calls.

Prior to this pass, host to device calls are represented as subroutine calls, with environment parameters (e.g. env_thread) specified internally. The device function is an internal function, without a tvm::attr::kGlobalSymbol attribute.

After this pass, host to device calls are represented as tvm_call_packed built-in. The device function is an externally-exposed function, with a non-empty tvm::attr::kGlobalSymbol attribute.

Returns
The pass.

◆ LowerDeviceStorageAccessInfo()

Pass tvm::tir::transform::LowerDeviceStorageAccessInfo ( )

Lower attached storage access information on device.

Note
Run this pass after all storage access analysis finish.
Returns
The pass.

◆ LowerIntrin()

Pass tvm::tir::transform::LowerIntrin ( )

Lower the target specific function intrinsics in each of the function.

Returns
The pass.

◆ LowerThreadAllreduce()

Pass tvm::tir::transform::LowerThreadAllreduce ( )

Lower cross thread alleduce.

Returns
The pass.

◆ LowerTVMBuiltin()

Pass tvm::tir::transform::LowerTVMBuiltin ( )

Lower builtin intrinsics.

Returns
The pass.

◆ LowerVtcmAlloc()

Pass tvm::tir::transform::LowerVtcmAlloc ( )

◆ LowerWarpMemory()

Pass tvm::tir::transform::LowerWarpMemory ( )

Lower warp memory access to low-level device related function calls.

Returns
The pass.

◆ MakePackedAPI()

Pass tvm::tir::transform::MakePackedAPI ( )

Transform the high-level PrimFunc to a low-level version that can be used as an API function.

The main task of this function is to create code to :

  • Map the values in the api_args to Var that is required by body.
  • Insert assertions to check type/value of the passed arguments.
Note
The function signature have two cases

let num_packed_args = len(api_args);

if num_packed_args is zero: f()

if num_packed_args is not zero: f(void *, TVMFFIAny* packed_args, int num_packed_args, api_arg_k, api_arg_k+1, ... api_arg_n, TVMFFIAny* out_ret_val)

where n == len(api_args), k == num_packed_args

Returns
The pass.

◆ MakeUnpackedAPI()

Pass tvm::tir::transform::MakeUnpackedAPI ( )

Transform the high-level PrimFunc to a C signature that can be used to call the operator directly.

The main task of this function is to create code that maps the values in the api_args to Var that is required by body

Returns
The pass.

◆ MergeSharedMemoryAllocations()

Pass tvm::tir::transform::MergeSharedMemoryAllocations ( )

A pass to merge multiple TIR-level shared memory allocations into one

◆ NarrowDataType()

Pass tvm::tir::transform::NarrowDataType ( int  target_bits)

Narrow down PrimExpr datatype in stmt to target_bits.

Parameters
target_bitsThe target bits
Note
Run this pass after storage flatten.
Returns
The pass.

◆ OOBChecker()

Pass tvm::tir::transform::OOBChecker ( )

Statically check TIR code for out of bounds array access.

This analysis is conservative: it will only raise errors if it can prove that out of bounds access occurs. Cases that are uncertain do not raise errors.

Returns
The pass.

◆ PointerValueTypeRewrite()

Pass tvm::tir::transform::PointerValueTypeRewrite ( )

Rewrite the pointer content type of arguments, as well as Alloc internal to the function to use the most frequently accessed type for load/store to avoid pointer casting in backend when possible.

Returns
The pass.

◆ RemapThreadAxis()

Pass tvm::tir::transform::RemapThreadAxis ( ffi::Map< ffi::String, IterVar axis_map)

Remap the thread axis.

This can be used to get equivalent program which uses threadIdx.y in place of threadIdx.x by passing {"threadIdx.x": thread_axis("threadIdx.y")}

Returns
The pass.

◆ RemoveNoOp()

Pass tvm::tir::transform::RemoveNoOp ( )

Remove No Op from the Stmt.

Returns
The pass.

◆ RemoveWeightLayoutRewriteBlock()

Pass tvm::tir::transform::RemoveWeightLayoutRewriteBlock ( bool  skip_tensor_rewrite = false)

Remove the weight layout rewrite block.

Parameters
skip_tensor_rewriteIf True, exact rewrite of Tensor, according to the given index map, will be skipped. Only the shape of the Tensor is transformed correctly, and the content of the destination array will be filled with random values.

When this pass is called many times during MetaSchedule tuning, the raw data of Tensor, before and after rewrite, does not matter. Since Tensor layout rewrite, using IndexMap's MapTensor, is currently slow, skipping the exact rewrite is sometimes necessary.

Returns
The pass.

◆ RenormalizeSplitPattern()

Pass tvm::tir::transform::RenormalizeSplitPattern ( )

Renormalize the split pattern from floordiv(floormod()) to floormod(floordiv())

Returns
The pass.

◆ RewriteUnsafeSelect()

Pass tvm::tir::transform::RewriteUnsafeSelect ( )

Detect and rewrite unsafe select that contains memory access.

Returns
The pass.

◆ Simplify()

Pass tvm::tir::transform::Simplify ( )

Run arithmetic simplifications on the statements and expressions.

Returns
The pass.

◆ SkipAssert()

Pass tvm::tir::transform::SkipAssert ( )

skip assert stmt.

Returns
The pass.

◆ SplitHostDevice()

Pass tvm::tir::transform::SplitHostDevice ( )

Split the function into a host function and device functions.

The resulting host-side function will keep the same tvm::attr::kTarget attribute (e.g. T.target("cuda", host=T.target("llvm"))). This ensures that MakePackedAPI knows which device type should be used for the input buffers.

The resulting device-side function will have the host stripped from its target attribute (e.g. T.target("cuda")).

Returns
The pass.

◆ StorageRewrite()

Pass tvm::tir::transform::StorageRewrite ( )

Rewrite storage allocation pattern. Moves the allocation to outer most possible scope. Trying to share space between allocations to make a static allocation plan when possible.

Returns
The pass.

◆ ThreadSync()

Pass tvm::tir::transform::ThreadSync ( ffi::String  storage_scope)

Insert sync between parallel read/write of shared buffers.

Parameters
storage_scopeThe storage scope considered.
Returns
The pass.

◆ UnifiedStaticMemoryPlanner()

Pass tvm::tir::transform::UnifiedStaticMemoryPlanner ( )

This is the unified static memory planner pass that will plan for memory intra- and inter- PrimFuncs together. The pass requires all the function to be PrimFuncs including the main.

Returns
The pass.

◆ UnrollLoop()

Pass tvm::tir::transform::UnrollLoop ( )

unroll the constant loop marked by unroll. This pass also automatically attach pragma unroll tag to loops which meets the standard.

Returns
The pass.

◆ UseAssumeToReduceBranches()

Pass tvm::tir::transform::UseAssumeToReduceBranches ( )

This pass analyzes primfunc & eliminates branch introdued due to layout specific padding. It leverages from the buffer assumptions and use the information to eliminate the branch.

Note
This creates more opportunity to vectorize the code.
Returns
The Pass.

◆ VectorizeLoop()

Pass tvm::tir::transform::VectorizeLoop ( bool  enable_vectorize = true)

Lower vectorization loops.

Parameters
enable_vectorizeWhether vectorization is enabled.
Returns
The pass.

◆ VerifyGPUCode()

Pass tvm::tir::transform::VerifyGPUCode ( ffi::Map< ffi::String, PrimExpr constraints)

Pass variant of VerifyGPUCode.

Parameters
constraintsThe dict to specify constraints to check.
Returns
The pass.
See also
tvm::tir::VerifyGPUCode

◆ VerifyMemory()

Pass tvm::tir::transform::VerifyMemory ( )

Pass variant of VerifyMemory.

Returns
The pass.
See also
tvm::tir::VerifyMemory

◆ VerifySSA()

Pass tvm::tir::transform::VerifySSA ( )

Pass variant of VerifySSA.

Returns
The pass.
See also
tvm::tir::VerifySSA

◆ VerifyVTCMLimit()

Pass tvm::tir::transform::VerifyVTCMLimit ( ffi::Optional< Target target = std::nullopt)

Pass to checks if the size of the allocated vtcm memory satisfies the limit.

Parameters
targetThe target whose VTCM limit should be used for any functions not already annotated with tvm::attr::kTarget.
Returns
The pass.
See also
tvm::tir::CalculateAllocatedBytes