tvm
Namespaces | Functions
transform.h File Reference

TIR specific transformation passes. More...

#include <tvm/ir/transform.h>
#include <tvm/target/target.h>
#include <tvm/tir/expr.h>
#include <tvm/tir/function.h>
#include <string>
#include <vector>
Include dependency graph for transform.h:

Go to the source code of this file.

Namespaces

 tvm
 runtime implementation for LibTorch/TorchScript.
 
 tvm::tir
 
 tvm::tir::transform
 

Functions

Pass tvm::tir::transform::CreatePrimFuncPass (const runtime::TypedPackedFunc< PrimFunc(PrimFunc, IRModule, PassContext)> &pass_func, int opt_level, String name, tvm::Array< String > required, bool traceable=false)
 
Pass tvm::tir::transform::InjectPrefetch ()
 Inject prefetch instructions into stmt. More...
 
Pass tvm::tir::transform::StorageFlatten (int cache_line_size, bool create_bound_attribute=false)
 Flatten the multi-dimensional read/write to single dimensional Load/Store. More...
 
Pass tvm::tir::transform::InjectCopyIntrin (String pragma_key, runtime::PackedFunc fintrin)
 Inject copy intrinsics with optional pad. More...
 
Pass tvm::tir::transform::CoProcSync ()
 Detect and insert sync points to co-processor. More...
 
Pass tvm::tir::transform::LiftAttrScope (String attr_key)
 Lift common attrs with attr_key to outer scope. More...
 
Pass tvm::tir::transform::LoopPartition ()
 partition loops in the stmt. More...
 
Pass tvm::tir::transform::VectorizeLoop (bool enable_vectorize=true)
 Lower vectorization loops. More...
 
Pass tvm::tir::transform::InjectVirtualThread ()
 Inject virtual thread loops. More...
 
Pass tvm::tir::transform::InjectDoubleBuffer ()
 Inject double buffer statements. More...
 
Pass tvm::tir::transform::StorageRewrite ()
 Rewrite storage allocation pattern. Moves the allocation to outer most possible scope. Trying to share space between allocations to make a static allocation plan when possible. More...
 
Pass tvm::tir::transform::UnrollLoop ()
 unroll the constant loop marked by unroll. This pass also automatically attach pragma unroll tag to loops which meets the standard. More...
 
Pass tvm::tir::transform::RemoveNoOp ()
 Remove No Op from the Stmt. More...
 
Pass tvm::tir::transform::RewriteUnsafeSelect ()
 Detect and rewrite unsafe select that contains memory access. More...
 
Pass tvm::tir::transform::Simplify ()
 Run arithmetic simplifications on the statements and expressions. More...
 
Pass tvm::tir::transform::ConvertSSA ()
 Convert an IRModule to be SSA form. More...
 
Pass tvm::tir::transform::InstrumentBoundCheckers ()
 Instruments bound checkers. More...
 
Pass tvm::tir::transform::MakePackedAPI ()
 Transform the high-level PrimFunc to a low-level version that can be used as an API function. More...
 
Pass tvm::tir::transform::MakeUnpackedAPI ()
 Transform the high-level PrimFunc to a C signature that can be used to call the operator directly. More...
 
Pass tvm::tir::transform::RemapThreadAxis (Map< String, IterVar > axis_map)
 Remap the thread axis. More...
 
Pass tvm::tir::transform::LowerCustomDatatypes ()
 Lower custom datatypes. More...
 
Pass tvm::tir::transform::DecorateDeviceScope ()
 Decorate all the function's body as device function. More...
 
Pass tvm::tir::transform::AnnotateDeviceRegions ()
 Annotate locations that should be run on the device. More...
 
Pass tvm::tir::transform::SplitHostDevice ()
 Split the function into a host function and device functions. More...
 
Pass tvm::tir::transform::LowerDeviceKernelLaunch ()
 Lower cross-device function calls. More...
 
Pass tvm::tir::transform::SkipAssert ()
 skip assert stmt. More...
 
Pass tvm::tir::transform::ThreadSync (String storage_scope)
 Insert sync between parallel read/write of shared buffers. More...
 
Pass tvm::tir::transform::LowerThreadAllreduce ()
 Lower cross thread alleduce. More...
 
Pass tvm::tir::transform::InferFragment ()
 Infer the TensorCore fragment infomation using tensor intrinsics. More...
 
Pass tvm::tir::transform::LowerTVMBuiltin ()
 Lower builtin intrinsics. More...
 
Pass tvm::tir::transform::LowerIntrin ()
 Lower the target specific function intrinsics in each of the function. More...
 
Pass tvm::tir::transform::LowerWarpMemory ()
 Lower warp memory access to low-level device related function calls. More...
 
Pass tvm::tir::transform::LowerDeviceStorageAccessInfo ()
 Lower attached storage access information on device. More...
 
Pass tvm::tir::transform::CombineContextCall ()
 Combine context calls in the host function. More...
 
Pass tvm::tir::transform::NarrowDataType (int target_bits)
 Narrow down PrimExpr datatype in stmt to target_bits. More...
 
Pass tvm::tir::transform::ForceNarrowIndexToInt32 ()
 Force to narrow down indexing expressions and integer buffers to int32 dtype. More...
 
Pass tvm::tir::transform::BF16ComputeLegalize ()
 Legalize bf16 compute Ops. Add a cast to fp32 before Ops, then add a cast back to bf16. More...
 
Pass tvm::tir::transform::FP8ComputeLegalize (String promote_dtype_str="float16")
 Legalize fp8 compute Ops. Add a cast to fp16/fp32 before Ops, then add a cast back to fp8. More...
 
Pass tvm::tir::transform::BF16StorageLegalize ()
 Legalize bf16 storage types to u16. More...
 
Pass tvm::tir::transform::FP8StorageLegalize ()
 Legalize fp8 storage types to u8. More...
 
Pass tvm::tir::transform::InlinePrivateFunctions ()
 Inline calls to private functions. More...
 
Pass tvm::tir::transform::PointerValueTypeRewrite ()
 Rewrite the pointer content type of arguments, as well as Alloc internal to the function to use the most frequently accessed type for load/store to avoid pointer casting in backend when possible. More...
 
Pass tvm::tir::transform::HoistIfThenElse ()
 Hoist loop-invariant IfThenElse nodes to outside the elligible loops. More...
 
Pass tvm::tir::transform::HoistExpression ()
 Hoist loop-invariant expressions nodes to outside the elligible loops. More...
 
Pass tvm::tir::transform::LowerCrossThreadReduction ()
 Lower cross-thread reduction from thread bindings to intrinsic function calls. More...
 
Pass tvm::tir::transform::LowerInitBlock ()
 Lower block init stmt into IfThenElse stmts. More...
 
Pass tvm::tir::transform::PlanAndUpdateBufferAllocationLocation ()
 Locate the buffer allocation to the exact position (usually is the lca of buffer access). This pass will inject opaque block with alloc_buffers at the allocation site. More...
 
Pass tvm::tir::transform::ConvertBlocksToOpaque ()
 Substitute all the block vars with the PrimExprs they are bound to, indicated by the corresponding iter_values in BlockRealize, for opaque blocks by removing all . the iter_values in BlockRealize and iter_vars in Block. More...
 
Pass tvm::tir::transform::LiftThreadBinding ()
 Lift the same thread bindings to their LCA loops. More...
 
Pass tvm::tir::transform::CompactBufferAllocation (bool is_strict=true)
 Compact the buffer access region by removing the buffer regions that are not accessed, i.e. narrowing the buffer shape and adjust the access region if necessary. More...
 
Pass tvm::tir::transform::LegalizePackedCalls ()
 
Pass tvm::tir::transform::LowerMatchBuffer ()
 Remove match buffers inside the block. Also, it will validate the binding. More...
 
Pass tvm::tir::transform::InjectPermutedLayout ()
 Inject permuted layout for shared memory. More...
 
Pass tvm::tir::transform::TransformMmaBufferLayout ()
 Transform Mma scope (m16n8k8.matrixA/B/C) to local scope with layout transformation. More...
 
Pass tvm::tir::transform::LowerOpaqueBlock ()
 Remove the block to ensure that the TIR can not be scheduled again. More...
 
Pass tvm::tir::transform::FlattenBuffer ()
 Flatten the multi-dimensional BufferLoad and BufferStore to single dimensional BufferLoad/BufferStore for the TIR not contains opaque block. More...
 
Pass tvm::tir::transform::TextureFlatten ()
 
Pass tvm::tir::transform::LowerVtcmAlloc ()
 
Pass tvm::tir::transform::LowerAsyncDMA ()
 Lower Async TIR primitives to DMA copy and wait builtins. More...
 
Pass tvm::tir::transform::CommonSubexprElimTIR (bool enable_cse_tir=true, bool identify_equiv_terms=false)
 Implements a Common Subexpression Elimination (CSE) for TIR which introduces let-in bindings for duplicated sub-expressions. More...
 
Pass tvm::tir::transform::InstallDebugSpans ()
 Add TIR-printer output as debug information to all ops in the module. More...
 
Pass tvm::tir::transform::UnifyThreadBinding ()
 Unify all the thread bindings for "blockIdx.x/y/z", "threadIdx.x/y/z", and "vthread.x/y/z". Before the unification, two vars that are bound to a thread axis (e.g., "threadIdx.x") use different IterVars and variables in their AttrStmts. After the unification, we use a consolidated IterVar and a variable for them. More...
 
Pass tvm::tir::transform::MergeSharedMemoryAllocations ()
 
Pass tvm::tir::transform::ConvertForLoopsToSerial ()
 This pass is post-scheduling pass to convert all Parallel For loops to Serial ones. This is run to attain lesser memory and/or executor/backend does not support parallel launch of For loops. More...
 
Pass tvm::tir::transform::UnifiedStaticMemoryPlanner ()
 This is the unified static memory planner pass that will plan for memory intra- and inter- PrimFuncs together. The pass requires all the function to be PrimFuncs including the main. More...
 
Pass tvm::tir::transform::InjectSoftwarePipeline ()
 This pass transforms annotated loops into pipelined ones where producers and consumers are overlapped with the information provided in loop annotations, which enables optimization techniques like prefetching and pipeline parallelism. More...
 
Pass tvm::tir::transform::BindParams (const Array< runtime::NDArray > &constants)
 
Pass tvm::tir::transform::ExtractPrimFuncConstants ()
 Pass to collect tir non-scalar constants into module's 'Constants' attribute. More...
 
Pass tvm::tir::transform::LowerAutoCopy ()
 Automatically do memory optimizations for auto copy blocks. More...
 
Pass tvm::tir::transform::RenormalizeSplitPattern ()
 Renormalize the split pattern from floordiv(floormod()) to floormod(floordiv()) More...
 
Pass tvm::tir::transform::BindTarget (Target target)
 Annotate a PrimFunc with a given target. More...
 
Pass tvm::tir::transform::AnnotateEntryFunc ()
 Set a PrimFunc as the entry point if it is only function in IRModule. More...
 
Pass tvm::tir::transform::Filter (runtime::TypedPackedFunc< bool(PrimFunc)> fcond)
 Filter PrimFuncs with a given condition. More...
 
Pass tvm::tir::transform::InjectPTXAsyncCopy ()
 Pass to rewrite global to shared memory copy on CUDA with asyncronous copy. More...
 
Pass tvm::tir::transform::InjectPTXLDG32 (bool enable_ptx_ldg32=true)
 Pass to rewrite global to local memory copy on CUDA with ldg32 instruction. More...
 
Pass tvm::tir::transform::RemoveWeightLayoutRewriteBlock (bool skip_ndarray_rewrite=false)
 Remove the weight layout rewrite block. More...
 
Pass tvm::tir::transform::ManifestSharedMemoryLocalStage ()
 Add the explicit local stage for the shared memory access on GPU. More...
 
Pass tvm::tir::transform::InstrumentProfileIntrinsics ()
 Insert intrinsic calls to instrument function and loop level profiling. More...
 
Pass tvm::tir::transform::DefaultGPUSchedule ()
 The pass sets default thread bindings for PrimFuncs, including symbolic shape functions, allowing their build and execution on GPU devices. It examines all the blocks within the PrimFunc and conducts loop fusion, splitting, and reordering operations based on the loop extent and target information, such as the maximum thread block number and maximum thread per block. More...
 
Pass tvm::tir::transform::UseAssumeToReduceBranches ()
 This pass analyzes primfunc & eliminates branch introdued due to layout specific padding. It leverages from the buffer assumptions and use the information to eliminate the branch. More...
 

Detailed Description

TIR specific transformation passes.