Functions
Pass	VerifySSA ()
	Pass variant of VerifySSA. More...

Pass	VerifyMemory ()
	Pass variant of VerifyMemory. More...

Pass	VerifyGPUCode (Map< String, PrimExpr > constraints)
	Pass variant of VerifyGPUCode. More...

Pass	CreatePrimFuncPass (const runtime::TypedPackedFunc< PrimFunc(PrimFunc, IRModule, PassContext)> &pass_func, int opt_level, String name, tvm::Array< String > required)

Pass	InjectPrefetch ()
	Inject prefetch instructions into stmt. More...

Pass	StorageFlatten (int cache_line_size, bool create_bound_attribute=false)
	Flatten the multi-dimensional read/write to single dimensional Load/Store. More...

Pass	InjectCopyIntrin (String pragma_key, runtime::PackedFunc fintrin)
	Inject copy intrinsics with optional pad. More...

Pass	CoProcSync ()
	Detect and insert sync points to co-processor. More...

Pass	LiftAttrScope (String attr_key)
	Lift common attrs with attr_key to outer scope. More...

Pass	LoopPartition ()
	partition loops in the stmt. More...

Pass	VectorizeLoop (bool enable_vectorize=true)
	Lower vectorization loops. More...

Pass	InjectVirtualThread ()
	Inject virtual thread loops. More...

Pass	InjectDoubleBuffer ()
	Inject double buffer statements. More...

Pass	StorageRewrite ()
	Rewrite storage allocation pattern. Moves the allocation to outer most possible scope. Trying to share space between allocations to make a static allocation plan when possible. More...

Pass	UnrollLoop ()
	unroll the constant loop marked by unroll. This pass also automatically attach pragma unroll tag to loops which meets the standard. More...

Pass	RemoveNoOp ()
	Remove No Op from the Stmt. More...

Pass	RewriteUnsafeSelect ()
	Detect and rewrite unsafe select that contains memory access. More...

Pass	Simplify ()
	Run arithmetic simplifications on the statements and expressions. More...

Pass	InstrumentBoundCheckers ()
	Instruments bound checkers. More...

Pass	MakePackedAPI (int num_unpacked_args)
	Transform the high-level PrimFunc to a low-level version that can be used as an API function. More...

Pass	MakeUnpackedAPI ()
	Transform the high-level PrimFunc to a C signature that can be used to call the operator directly. More...

Pass	RemapThreadAxis (Map< String, IterVar > axis_map)
	Remap the thread axis. More...

Pass	LowerCustomDatatypes ()
	Lower custom datatypes. More...

Pass	DecorateDeviceScope ()
	Decorate all the function's body as device function. More...

Pass	SplitHostDevice ()
	Split the function into a host function and device functions. More...

Pass	SkipAssert ()
	skip assert stmt. More...

Pass	ThreadSync (String storage_scope)
	Insert sync between parallel read/write of shared buffers. More...

Pass	LowerThreadAllreduce ()
	Lower cross thread alleduce. More...

Pass	InferFragment ()
	Infer the TensorCore fragment infomation using tensor intrinsics. More...

Pass	LowerTVMBuiltin ()
	Lower builtin intrinsics. More...

Pass	LowerIntrin ()
	Lower the target specific function intrinsics in each of the function. More...

Pass	LowerWarpMemory ()
	Lower warp memory access to low-level device related function calls. More...

Pass	LowerDeviceStorageAccessInfo ()
	Lower attached storage access information on device. More...

Pass	CombineContextCall ()
	Combine context calls in the host function. More...

Pass	NarrowDataType (int target_bits)
	Narrow down PrimExpr datatype in stmt to target_bits. More...

Pass	BF16Legalize ()
	Legalize bf16 typed Ops. Add a cast to fp32 before Ops, then add a cast back to bf16. More...

Pass	PointerValueTypeRewrite ()
	Rewrite the pointer content type of arguments, as well as Alloc internal to the function to use the most frequently accessed type for load/store to avoid pointer casting in backend when possible. More...

Pass	HoistIfThenElse ()
	Hoist loop-invariant IfThenElse nodes to outside the elligible loops. More...

Pass	LowerInitBlock ()
	Lower block init stmt into IfThenElse stmts. More...

Pass	PlanAndUpdateBufferAllocationLocation ()
	Locate the buffer allocation to the exact position (usually is the lca of buffer access). This pass will inject opaque block with alloc_buffers at the allocation site. More...

Pass	ConvertBlocksToOpaque ()
	Substitute all the block vars with the PrimExprs they are bound to, indicated by the corresponding iter_values in BlockRealize, for opaque blocks by removing all . the iter_values in BlockRealize and iter_vars in Block. More...

Pass	CompactBufferAllocation ()
	Compact the buffer access region by removing the buffer regions that are not accessed, i.e. narrowing the buffer shape and adjust the access region if necessary. More...

Pass	LegalizePackedCalls ()

Pass	LowerMatchBuffer ()
	Remove match buffers inside the block. Also, it will validate the binding. More...

Pass	FlattenBuffer ()
	Flatten the multi-dimensional BufferLoad and BufferStore to single dimensional Load/Store. Also remove Block to ensure that the flattened TIR can not be scheduled again. More...

Pass	TextureFlatten ()

Pass	UnifyThreadBinding ()
	Unify all the thread bindings for "blockIdx.x/y/z", "threadIdx.x/y/z", and "vthread.x/y/z". Before the unification, two vars that are bound to a thread axis (e.g., "threadIdx.x") use different IterVars and variables in their AttrStmts. After the unification, we use a consolidated IterVar and a variable for them. More...

Pass	MergeDynamicSharedMemoryAllocations ()

Pass	ConvertForLoopsToSerial ()
	This pass is post-scheduling pass to convert all Parallel For loops to Serial ones. This is run to attain lesser memory and/or executor/backend does not support parallel launch of For loops. More...

Function Documentation

◆ BF16Legalize()

Pass tvm::tir::transform::BF16Legalize ( )

Legalize bf16 typed Ops. Add a cast to fp32 before Ops, then add a cast back to bf16.

Returns: The pass.

◆ CombineContextCall()

Pass tvm::tir::transform::CombineContextCall ( )

Combine context calls in the host function.

Returns: The pass.

◆ CompactBufferAllocation()

Pass tvm::tir::transform::CompactBufferAllocation ( )

Compact the buffer access region by removing the buffer regions that are not accessed, i.e. narrowing the buffer shape and adjust the access region if necessary.

Before narrowing, B is a [16, 16] buffer, but only a skinny vector B[i, 0:16] is accessed.

for i in range(0, 16):
    with T.block():
        B = T.alloc_buffer(16, 16)
        for j in range(0, 16):
            B[i, j] = A[i, j] + 1
        for j in range(0, 16):
            C[i, j] = B[i, j] + 1

This pass narrows the buffer shape and adjust its accessed region accordingly. In this particular case, because only a 1 * 16 vector of B is accessed, the pass narrows B to shape [1, 16], and changes the access to B[i, j] to B[0, j].

for i in range(0, 16):
    with T.block():
        B = T.alloc_buffer(1, 16)
        for j in range(0, 16):
            B[0, j] = A[i, j] + 1
        for j in range(0, 16):
            C[i, j] = B[0, j] + 1

Returns: The pass.

◆ ConvertBlocksToOpaque()

Pass tvm::tir::transform::ConvertBlocksToOpaque ( )

Substitute all the block vars with the PrimExprs they are bound to, indicated by the corresponding iter_values in BlockRealize, for opaque blocks by removing all . the iter_values in BlockRealize and iter_vars in Block.

Returns: The pass.

◆ ConvertForLoopsToSerial()

Pass tvm::tir::transform::ConvertForLoopsToSerial ( )

This pass is post-scheduling pass to convert all Parallel For loops to Serial ones. This is run to attain lesser memory and/or executor/backend does not support parallel launch of For loops.

Returns: The pass.

◆ CoProcSync()

Pass tvm::tir::transform::CoProcSync ( )

Detect and insert sync points to co-processor.

Returns: The pass.

◆ CreatePrimFuncPass()

Pass tvm::tir::transform::CreatePrimFuncPass	(	const runtime::TypedPackedFunc< PrimFunc(PrimFunc, IRModule, PassContext)> &	pass_func,
		int	opt_level,
		String	name,
		tvm::Array< String >	required
	)

◆ DecorateDeviceScope()

Pass tvm::tir::transform::DecorateDeviceScope ( )

Decorate all the function's body as device function.

Returns: The pass.

◆ FlattenBuffer()

Pass tvm::tir::transform::FlattenBuffer ( )

Flatten the multi-dimensional BufferLoad and BufferStore to single dimensional Load/Store. Also remove Block to ensure that the flattened TIR can not be scheduled again.

Returns: The pass.

◆ HoistIfThenElse()

Pass tvm::tir::transform::HoistIfThenElse ( )

Hoist loop-invariant IfThenElse nodes to outside the elligible loops.

Returns: The pass.

◆ InferFragment()

Pass tvm::tir::transform::InferFragment ( )

Infer the TensorCore fragment infomation using tensor intrinsics.

Returns: The pass.

◆ InjectCopyIntrin()

Pass tvm::tir::transform::InjectCopyIntrin	(	String	pragma_key,
		runtime::PackedFunc	fintrin
	)

Inject copy intrinsics with optional pad.

Parameters

pragma_key	The pragma key for hint of copy.
fintrin	The function with signature

Stmt fintrin(Buffer src, Buffer dst, Array<Expr> pad_before, Array<Expr> pad_after, Expr pad_value)

Returns: The pass.

◆ InjectDoubleBuffer()

Pass tvm::tir::transform::InjectDoubleBuffer ( )

Inject double buffer statements.

Returns: The pass.

◆ InjectPrefetch()

Pass tvm::tir::transform::InjectPrefetch ( )

Inject prefetch instructions into stmt.

Returns: The pass.

◆ InjectVirtualThread()

Pass tvm::tir::transform::InjectVirtualThread ( )

Inject virtual thread loops.

Returns: The pass.

◆ InstrumentBoundCheckers()

Pass tvm::tir::transform::InstrumentBoundCheckers ( )

Instruments bound checkers.

Returns: The pass.

◆ LegalizePackedCalls()

Pass tvm::tir::transform::LegalizePackedCalls ( )

This pass legalizes packed calls by wrapping their arguments into TVMValues

◆ LiftAttrScope()

Pass tvm::tir::transform::LiftAttrScope ( String attr_key )

Lift common attrs with attr_key to outer scope.

Parameters

attr_key The attribute key to be checked.

Returns: The pass.

◆ LoopPartition()

Pass tvm::tir::transform::LoopPartition ( )

partition loops in the stmt.

Returns: The pass.

◆ LowerCustomDatatypes()

Pass tvm::tir::transform::LowerCustomDatatypes ( )

Lower custom datatypes.

See tvm::datatypes::Registry for more information on adding custom datatypes.

Returns: The pass.

◆ LowerDeviceStorageAccessInfo()

Pass tvm::tir::transform::LowerDeviceStorageAccessInfo ( )

Lower attached storage access information on device.

Note: Run this pass after all storage access analysis finish.

Returns: The pass.

◆ LowerInitBlock()

Pass tvm::tir::transform::LowerInitBlock ( )

Lower block init stmt into IfThenElse stmts.

Returns: The pass.

◆ LowerIntrin()

Pass tvm::tir::transform::LowerIntrin ( )

Lower the target specific function intrinsics in each of the function.

Returns: The pass.

◆ LowerMatchBuffer()

Pass tvm::tir::transform::LowerMatchBuffer ( )

Remove match buffers inside the block. Also, it will validate the binding.

Returns: The pass.

◆ LowerThreadAllreduce()

Pass tvm::tir::transform::LowerThreadAllreduce ( )

Lower cross thread alleduce.

Returns: The pass.

◆ LowerTVMBuiltin()

Pass tvm::tir::transform::LowerTVMBuiltin ( )

Lower builtin intrinsics.

Returns: The pass.

◆ LowerWarpMemory()

Pass tvm::tir::transform::LowerWarpMemory ( )

Lower warp memory access to low-level device related function calls.

Returns: The pass.

◆ MakePackedAPI()

Pass tvm::tir::transform::MakePackedAPI ( int num_unpacked_args )

Transform the high-level PrimFunc to a low-level version that can be used as an API function.

The main task of this function is to create code to :

Map the values in the api_args to Var that is required by body.
Insert assertions to check type/value of the passed arguments.

Parameters

num_unpacked_args Number of arguments that are processed in plain form instead of packed form.

Note: The function signature have two cases

let num_packed_args = len(api_args) - num_unpacked_args;

if num_packed_args is zero: f(api_arg_0, api_arg_1, .., api_arg_n) where n == len(api_args)

if num_packed_args is not zero: f(TVMArg* packed_args, int* packed_arg_type_ids, int num_packed_args, api_arg_k, api_arg_k+1, ... api_arg_n, TVMValue* out_ret_val, int* out_ret_tcode)

where n == len(api_args), k == num_packed_args

Returns: The pass.

◆ MakeUnpackedAPI()

Pass tvm::tir::transform::MakeUnpackedAPI ( )

Transform the high-level PrimFunc to a C signature that can be used to call the operator directly.

The main task of this function is to create code that maps the values in the api_args to Var that is required by body

Returns: The pass.

◆ MergeDynamicSharedMemoryAllocations()

Pass tvm::tir::transform::MergeDynamicSharedMemoryAllocations ( )

A pass to merge multiple TIR-level dynamic shared memory allocations into one

◆ NarrowDataType()

Pass tvm::tir::transform::NarrowDataType ( int target_bits )

Narrow down PrimExpr datatype in stmt to target_bits.

Parameters

target_bits The target bits

Note: Run this pass after storage flatten.

Returns: The pass.

◆ PlanAndUpdateBufferAllocationLocation()

Pass tvm::tir::transform::PlanAndUpdateBufferAllocationLocation ( )

Locate the buffer allocation to the exact position (usually is the lca of buffer access). This pass will inject opaque block with alloc_buffers at the allocation site.

Returns: The pass.

◆ PointerValueTypeRewrite()

Pass tvm::tir::transform::PointerValueTypeRewrite ( )

Rewrite the pointer content type of arguments, as well as Alloc internal to the function to use the most frequently accessed type for load/store to avoid pointer casting in backend when possible.

Returns: The pass.

◆ RemapThreadAxis()

Pass tvm::tir::transform::RemapThreadAxis ( Map< String, IterVar > axis_map )

Remap the thread axis.

This can be used to get equivalent program which uses threadIdx.y in place of threadIdx.x by passing {"threadIdx.x": thread_axis("threadIdx.y")}

Returns: The pass.

◆ RemoveNoOp()

Pass tvm::tir::transform::RemoveNoOp ( )

Remove No Op from the Stmt.

Returns: The pass.

◆ RewriteUnsafeSelect()

Pass tvm::tir::transform::RewriteUnsafeSelect ( )

Detect and rewrite unsafe select that contains memory access.

Returns: The pass.

◆ Simplify()

Pass tvm::tir::transform::Simplify ( )

Run arithmetic simplifications on the statements and expressions.

Returns: The pass.

◆ SkipAssert()

Pass tvm::tir::transform::SkipAssert ( )

skip assert stmt.

Returns: The pass.

◆ SplitHostDevice()

Pass tvm::tir::transform::SplitHostDevice ( )

Split the function into a host function and device functions.

Returns: The pass.

◆ StorageFlatten()

Pass tvm::tir::transform::StorageFlatten	(	int	cache_line_size,
		bool	create_bound_attribute = `false`
	)

Flatten the multi-dimensional read/write to single dimensional Load/Store.

Parameters

cache_line_size	The size of CPU cache line.
create_bound_attribute	Whether to create bound attributes.

Returns: The Pass

◆ StorageRewrite()

Pass tvm::tir::transform::StorageRewrite ( )

Rewrite storage allocation pattern. Moves the allocation to outer most possible scope. Trying to share space between allocations to make a static allocation plan when possible.

Returns: The pass.

◆ TextureFlatten()

Pass tvm::tir::transform::TextureFlatten ( )

◆ ThreadSync()

Pass tvm::tir::transform::ThreadSync ( String storage_scope )

Insert sync between parallel read/write of shared buffers.

Parameters

storage_scope The storage scope considered.

Returns: The pass.

◆ UnifyThreadBinding()

Pass tvm::tir::transform::UnifyThreadBinding ( )

Unify all the thread bindings for "blockIdx.x/y/z", "threadIdx.x/y/z", and "vthread.x/y/z". Before the unification, two vars that are bound to a thread axis (e.g., "threadIdx.x") use different IterVars and variables in their AttrStmts. After the unification, we use a consolidated IterVar and a variable for them.

Returns: The pass.

Note: vthread is a legacy behavior that will be deprecated, though thread bindings of vthread are still also unified in this pass. Please use vthread.x, vthread.y and vthread.z instead.

◆ UnrollLoop()

Pass tvm::tir::transform::UnrollLoop ( )

unroll the constant loop marked by unroll. This pass also automatically attach pragma unroll tag to loops which meets the standard.

Returns: The pass.

◆ VectorizeLoop()

Pass tvm::tir::transform::VectorizeLoop ( bool enable_vectorize = true )

Lower vectorization loops.

Parameters

enable_vectorize Whether vectorization is enabled.

Returns: The pass.

◆ VerifyGPUCode()

Pass tvm::tir::transform::VerifyGPUCode ( Map< String, PrimExpr > constraints )

Pass variant of VerifyGPUCode.

Parameters

constraints The dict to specify constraints to check.

Returns: The pass.

See also: tvm::tir::VerifyGPUCode

◆ VerifyMemory()

Pass tvm::tir::transform::VerifyMemory ( )

Pass variant of VerifyMemory.

Returns: The pass.

See also: tvm::tir::VerifyMemory

◆ VerifySSA()

Pass tvm::tir::transform::VerifySSA ( )

Pass variant of VerifySSA.

Returns: The pass.

See also: tvm::tir::VerifySSA

Functions

Function Documentation

◆ BF16Legalize()

◆ CombineContextCall()

◆ CompactBufferAllocation()

◆ ConvertBlocksToOpaque()

◆ ConvertForLoopsToSerial()

◆ CoProcSync()

◆ CreatePrimFuncPass()

◆ DecorateDeviceScope()

◆ FlattenBuffer()

◆ HoistIfThenElse()

◆ InferFragment()

◆ InjectCopyIntrin()

◆ InjectDoubleBuffer()

◆ InjectPrefetch()

◆ InjectVirtualThread()

◆ InstrumentBoundCheckers()

◆ LegalizePackedCalls()

◆ LiftAttrScope()

◆ LoopPartition()

◆ LowerCustomDatatypes()

◆ LowerDeviceStorageAccessInfo()

◆ LowerInitBlock()

◆ LowerIntrin()

◆ LowerMatchBuffer()

◆ LowerThreadAllreduce()

◆ LowerTVMBuiltin()

◆ LowerWarpMemory()

◆ MakePackedAPI()

◆ MakeUnpackedAPI()

◆ MergeDynamicSharedMemoryAllocations()

◆ NarrowDataType()

◆ PlanAndUpdateBufferAllocationLocation()

◆ PointerValueTypeRewrite()

◆ RemapThreadAxis()

◆ RemoveNoOp()

◆ RewriteUnsafeSelect()

◆ Simplify()

◆ SkipAssert()

◆ SplitHostDevice()

◆ StorageFlatten()

◆ StorageRewrite()

◆ TextureFlatten()

◆ ThreadSync()

◆ UnifyThreadBinding()

◆ UnrollLoop()

◆ VectorizeLoop()

◆ VerifyGPUCode()

◆ VerifyMemory()

◆ VerifySSA()