|
Pass | VerifySSA () |
| Pass variant of VerifySSA. More...
|
|
Pass | VerifyMemory () |
| Pass variant of VerifyMemory. More...
|
|
Pass | VerifyGPUCode (Map< String, PrimExpr > constraints) |
| Pass variant of VerifyGPUCode. More...
|
|
Pass | CreatePrimFuncPass (const runtime::TypedPackedFunc< PrimFunc(PrimFunc, IRModule, PassContext)> &pass_func, int opt_level, String name, tvm::Array< String > required) |
|
Pass | InjectPrefetch () |
| Inject prefetch instructions into stmt. More...
|
|
Pass | StorageFlatten (int cache_line_size, bool create_bound_attribute=false) |
| Flatten the multi-dimensional read/write to single dimensional Load/Store. More...
|
|
Pass | InjectCopyIntrin (String pragma_key, runtime::PackedFunc fintrin) |
| Inject copy intrinsics with optional pad. More...
|
|
Pass | CoProcSync () |
| Detect and insert sync points to co-processor. More...
|
|
Pass | LiftAttrScope (String attr_key) |
| Lift common attrs with attr_key to outer scope. More...
|
|
Pass | LoopPartition () |
| partition loops in the stmt. More...
|
|
Pass | VectorizeLoop (bool enable_vectorize=true) |
| Lower vectorization loops. More...
|
|
Pass | InjectVirtualThread () |
| Inject virtual thread loops. More...
|
|
Pass | InjectDoubleBuffer () |
| Inject double buffer statements. More...
|
|
Pass | StorageRewrite () |
| Rewrite storage allocation pattern. Moves the allocation to outer most possible scope. Trying to share space between allocations to make a static allocation plan when possible. More...
|
|
Pass | UnrollLoop () |
| unroll the constant loop marked by unroll. This pass also automatically attach pragma unroll tag to loops which meets the standard. More...
|
|
Pass | RemoveNoOp () |
| Remove No Op from the Stmt. More...
|
|
Pass | RewriteUnsafeSelect () |
| Detect and rewrite unsafe select that contains memory access. More...
|
|
Pass | Simplify () |
| Run arithmetic simplifications on the statements and expressions. More...
|
|
Pass | InstrumentBoundCheckers () |
| Instruments bound checkers. More...
|
|
Pass | MakePackedAPI (int num_unpacked_args) |
| Transform the high-level PrimFunc to a low-level version that can be used as an API function. More...
|
|
Pass | MakeUnpackedAPI () |
| Transform the high-level PrimFunc to a C signature that can be used to call the operator directly. More...
|
|
Pass | RemapThreadAxis (Map< String, IterVar > axis_map) |
| Remap the thread axis. More...
|
|
Pass | LowerCustomDatatypes () |
| Lower custom datatypes. More...
|
|
Pass | DecorateDeviceScope () |
| Decorate all the function's body as device function. More...
|
|
Pass | SplitHostDevice () |
| Split the function into a host function and device functions. More...
|
|
Pass | SkipAssert () |
| skip assert stmt. More...
|
|
Pass | ThreadSync (String storage_scope) |
| Insert sync between parallel read/write of shared buffers. More...
|
|
Pass | LowerThreadAllreduce () |
| Lower cross thread alleduce. More...
|
|
Pass | InferFragment () |
| Infer the TensorCore fragment infomation using tensor intrinsics. More...
|
|
Pass | LowerTVMBuiltin () |
| Lower builtin intrinsics. More...
|
|
Pass | LowerIntrin () |
| Lower the target specific function intrinsics in each of the function. More...
|
|
Pass | LowerWarpMemory () |
| Lower warp memory access to low-level device related function calls. More...
|
|
Pass | LowerDeviceStorageAccessInfo () |
| Lower attached storage access information on device. More...
|
|
Pass | CombineContextCall () |
| Combine context calls in the host function. More...
|
|
Pass | NarrowDataType (int target_bits) |
| Narrow down PrimExpr datatype in stmt to target_bits. More...
|
|
Pass | BF16Legalize () |
| Legalize bf16 typed Ops. Add a cast to fp32 before Ops, then add a cast back to bf16. More...
|
|
Pass | PointerValueTypeRewrite () |
| Rewrite the pointer content type of arguments, as well as Alloc internal to the function to use the most frequently accessed type for load/store to avoid pointer casting in backend when possible. More...
|
|
Pass | HoistIfThenElse () |
| Hoist loop-invariant IfThenElse nodes to outside the elligible loops. More...
|
|
Pass | LowerInitBlock () |
| Lower block init stmt into IfThenElse stmts. More...
|
|
Pass | PlanAndUpdateBufferAllocationLocation () |
| Locate the buffer allocation to the exact position (usually is the lca of buffer access). This pass will inject opaque block with alloc_buffers at the allocation site. More...
|
|
Pass | ConvertBlocksToOpaque () |
| Substitute all the block vars with the PrimExprs they are bound to, indicated by the corresponding iter_values in BlockRealize, for opaque blocks by removing all . the iter_values in BlockRealize and iter_vars in Block. More...
|
|
Pass | CompactBufferAllocation () |
| Compact the buffer access region by removing the buffer regions that are not accessed, i.e. narrowing the buffer shape and adjust the access region if necessary. More...
|
|
Pass | LegalizePackedCalls () |
|
Pass | LowerMatchBuffer () |
| Remove match buffers inside the block. Also, it will validate the binding. More...
|
|
Pass | FlattenBuffer () |
| Flatten the multi-dimensional BufferLoad and BufferStore to single dimensional Load/Store. Also remove Block to ensure that the flattened TIR can not be scheduled again. More...
|
|
Pass | TextureFlatten () |
|
Pass | UnifyThreadBinding () |
| Unify all the thread bindings for "blockIdx.x/y/z", "threadIdx.x/y/z", and "vthread.x/y/z". Before the unification, two vars that are bound to a thread axis (e.g., "threadIdx.x") use different IterVars and variables in their AttrStmts. After the unification, we use a consolidated IterVar and a variable for them. More...
|
|
Pass | MergeDynamicSharedMemoryAllocations () |
|
Pass | ConvertForLoopsToSerial () |
| This pass is post-scheduling pass to convert all Parallel For loops to Serial ones. This is run to attain lesser memory and/or executor/backend does not support parallel launch of For loops. More...
|
|
Pass tvm::tir::transform::CompactBufferAllocation |
( |
| ) |
|
Compact the buffer access region by removing the buffer regions that are not accessed, i.e. narrowing the buffer shape and adjust the access region if necessary.
Before narrowing, B
is a [16, 16]
buffer, but only a skinny vector B[i, 0:16]
is accessed.
for i in range(0, 16):
with T.block():
B = T.alloc_buffer(16, 16)
for j in range(0, 16):
B[i, j] = A[i, j] + 1
for j in range(0, 16):
C[i, j] = B[i, j] + 1
This pass narrows the buffer shape and adjust its accessed region accordingly. In this particular case, because only a 1 * 16
vector of B
is accessed, the pass narrows B
to shape [1, 16]
, and changes the access to B[i, j]
to B[0, j]
.
for i in range(0, 16):
with T.block():
B = T.alloc_buffer(1, 16)
for j in range(0, 16):
B[0, j] = A[i, j] + 1
for j in range(0, 16):
C[i, j] = B[0, j] + 1
- Returns
- The pass.
Pass tvm::tir::transform::MakePackedAPI |
( |
int |
num_unpacked_args | ) |
|
Transform the high-level PrimFunc to a low-level version that can be used as an API function.
The main task of this function is to create code to :
- Map the values in the api_args to Var that is required by body.
- Insert assertions to check type/value of the passed arguments.
- Parameters
-
num_unpacked_args | Number of arguments that are processed in plain form instead of packed form. |
- Note
- The function signature have two cases
let num_packed_args = len(api_args) - num_unpacked_args;
if num_packed_args is zero: f(api_arg_0, api_arg_1, .., api_arg_n) where n == len(api_args)
if num_packed_args is not zero: f(TVMArg* packed_args, int* packed_arg_type_ids, int num_packed_args, api_arg_k, api_arg_k+1, ... api_arg_n, TVMValue* out_ret_val, int* out_ret_tcode)
where n == len(api_args), k == num_packed_args
- Returns
- The pass.