|
tvm
|
TIR specific transformation passes. More...
#include <tvm/ir/transform.h>#include <tvm/target/target.h>#include <tvm/tir/expr.h>#include <tvm/tir/function.h>#include <string>#include <vector>Go to the source code of this file.
Namespaces | |
| tvm | |
| Performance counters for profiling via the PAPI library. | |
| tvm::tir | |
| tvm::tir::transform | |
Functions | |
| Pass | tvm::tir::transform::CreatePrimFuncPass (std::function< PrimFunc(PrimFunc, IRModule, PassContext)> pass_func, int opt_level, ffi::String name, tvm::ffi::Array< ffi::String > required, bool traceable=false) |
| Pass | tvm::tir::transform::VectorizeLoop (bool enable_vectorize=true) |
| Lower vectorization loops. More... | |
| Pass | tvm::tir::transform::StorageRewrite () |
| Rewrite storage allocation pattern. Moves the allocation to outer most possible scope. Trying to share space between allocations to make a static allocation plan when possible. More... | |
| Pass | tvm::tir::transform::UnrollLoop () |
| unroll the constant loop marked by unroll. This pass also automatically attach pragma unroll tag to loops which meets the standard. More... | |
| Pass | tvm::tir::transform::RemoveNoOp () |
| Remove No Op from the Stmt. More... | |
| Pass | tvm::tir::transform::RewriteUnsafeSelect () |
| Detect and rewrite unsafe select that contains memory access. More... | |
| Pass | tvm::tir::transform::Simplify () |
| Run arithmetic simplifications on the statements and expressions. More... | |
| Pass | tvm::tir::transform::ConvertSSA () |
| Convert an IRModule to be SSA form. More... | |
| Pass | tvm::tir::transform::InstrumentBoundCheckers () |
| Instruments bound checkers. More... | |
| Pass | tvm::tir::transform::MakePackedAPI () |
| Transform the high-level PrimFunc to a low-level version that can be used as an API function. More... | |
| Pass | tvm::tir::transform::MakeUnpackedAPI () |
| Transform the high-level PrimFunc to a C signature that can be used to call the operator directly. More... | |
| Pass | tvm::tir::transform::RemapThreadAxis (ffi::Map< ffi::String, IterVar > axis_map) |
| Remap the thread axis. More... | |
| Pass | tvm::tir::transform::LowerCustomDatatypes () |
| Lower custom datatypes. More... | |
| Pass | tvm::tir::transform::DecorateDeviceScope () |
| Decorate all the function's body as device function. More... | |
| Pass | tvm::tir::transform::AnnotateDeviceRegions () |
| Annotate locations that should be run on the device. More... | |
| Pass | tvm::tir::transform::SplitHostDevice () |
| Split the function into a host function and device functions. More... | |
| Pass | tvm::tir::transform::LowerDeviceKernelLaunch () |
| Lower cross-device function calls. More... | |
| Pass | tvm::tir::transform::SkipAssert () |
| skip assert stmt. More... | |
| Pass | tvm::tir::transform::ThreadSync (ffi::String storage_scope) |
| Insert sync between parallel read/write of shared buffers. More... | |
| Pass | tvm::tir::transform::LowerThreadAllreduce () |
| Lower cross thread alleduce. More... | |
| Pass | tvm::tir::transform::InferFragment () |
| Infer the TensorCore fragment infomation using tensor intrinsics. More... | |
| Pass | tvm::tir::transform::LowerTVMBuiltin () |
| Lower builtin intrinsics. More... | |
| Pass | tvm::tir::transform::LowerIntrin () |
| Lower the target specific function intrinsics in each of the function. More... | |
| Pass | tvm::tir::transform::LowerWarpMemory () |
| Lower warp memory access to low-level device related function calls. More... | |
| Pass | tvm::tir::transform::LowerDeviceStorageAccessInfo () |
| Lower attached storage access information on device. More... | |
| Pass | tvm::tir::transform::CombineContextCall () |
| Combine context calls in the host function. More... | |
| Pass | tvm::tir::transform::NarrowDataType (int target_bits) |
| Narrow down PrimExpr datatype in stmt to target_bits. More... | |
| Pass | tvm::tir::transform::ForceNarrowIndexToInt32 () |
| Force to narrow down indexing expressions and integer buffers to int32 dtype. More... | |
| Pass | tvm::tir::transform::BF16ComputeLegalize () |
| Legalize bf16 compute Ops. Add a cast to fp32 before Ops, then add a cast back to bf16. More... | |
| Pass | tvm::tir::transform::FP8ComputeLegalize (ffi::String promote_dtype="float16") |
| Legalize fp8 compute Ops. Add a cast to fp16/fp32 before Ops, then add a cast back to fp8. More... | |
| Pass | tvm::tir::transform::BF16StorageLegalize () |
| Legalize bf16 storage types to u16. More... | |
| Pass | tvm::tir::transform::FP8StorageLegalize () |
| Legalize fp8 storage types to u8. More... | |
| Pass | tvm::tir::transform::InlinePrivateFunctions () |
| Inline calls to private functions. More... | |
| Pass | tvm::tir::transform::PointerValueTypeRewrite () |
| Rewrite the pointer content type of arguments, as well as Alloc internal to the function to use the most frequently accessed type for load/store to avoid pointer casting in backend when possible. More... | |
| Pass | tvm::tir::transform::HoistIfThenElse () |
| Hoist loop-invariant IfThenElse nodes to outside the elligible loops. More... | |
| Pass | tvm::tir::transform::HoistExpression () |
| Hoist loop-invariant expressions nodes to outside the elligible loops. More... | |
| Pass | tvm::tir::transform::FlattenBuffer () |
| Flatten the multi-dimensional BufferLoad and BufferStore to single dimensional BufferLoad/BufferStore for the TIR not contains opaque block. More... | |
| Pass | tvm::tir::transform::LowerVtcmAlloc () |
| Pass | tvm::tir::transform::LowerAsyncDMA () |
| Lower Async TIR primitives to DMA copy and wait builtins. More... | |
| Pass | tvm::tir::transform::CommonSubexprElimTIR (bool enable_cse_tir=true, bool identify_equiv_terms=false) |
| Implements a Common Subexpression Elimination (CSE) for TIR which introduces let-in bindings for duplicated sub-expressions. More... | |
| Pass | tvm::tir::transform::MergeSharedMemoryAllocations () |
| Pass | tvm::tir::transform::ConvertForLoopsToSerial () |
| This pass is post-scheduling pass to convert all Parallel For loops to Serial ones. This is run to attain lesser memory and/or executor/backend does not support parallel launch of For loops. More... | |
| Pass | tvm::tir::transform::UnifiedStaticMemoryPlanner () |
| This is the unified static memory planner pass that will plan for memory intra- and inter- PrimFuncs together. The pass requires all the function to be PrimFuncs including the main. More... | |
| Pass | tvm::tir::transform::BindParams (const ffi::Array< runtime::Tensor > &constants) |
| Pass | tvm::tir::transform::ExtractPrimFuncConstants () |
| Pass to collect tir non-scalar constants into module's 'Constants' attribute. More... | |
| Pass | tvm::tir::transform::RenormalizeSplitPattern () |
| Renormalize the split pattern from floordiv(floormod()) to floormod(floordiv()) More... | |
| Pass | tvm::tir::transform::BindTarget (Target target) |
| Annotate a PrimFunc with a given target. More... | |
| Pass | tvm::tir::transform::AnnotateEntryFunc () |
| Set a PrimFunc as the entry point if it is only function in IRModule. More... | |
| Pass | tvm::tir::transform::Filter (ffi::TypedFunction< bool(PrimFunc)> fcond) |
| Filter PrimFuncs with a given condition. More... | |
| Pass | tvm::tir::transform::InjectPTXAsyncCopy () |
| Pass to rewrite global to shared memory copy on CUDA with asyncronous copy. More... | |
| Pass | tvm::tir::transform::InjectPTXLDG32 (bool enable_ptx_ldg32=true) |
| Pass to rewrite global to local memory copy on CUDA with ldg32 instruction. More... | |
| Pass | tvm::tir::transform::RemoveWeightLayoutRewriteBlock (bool skip_tensor_rewrite=false) |
| Remove the weight layout rewrite block. More... | |
| Pass | tvm::tir::transform::InstrumentProfileIntrinsics () |
| Insert intrinsic calls to instrument function and loop level profiling. More... | |
| Pass | tvm::tir::transform::DefaultGPUSchedule () |
| The pass sets default thread bindings for PrimFuncs, including symbolic shape functions, allowing their build and execution on GPU devices. It examines all the blocks within the PrimFunc and conducts loop fusion, splitting, and reordering operations based on the loop extent and target information, such as the maximum thread block number and maximum thread per block. More... | |
| Pass | tvm::tir::transform::UseAssumeToReduceBranches () |
| This pass analyzes primfunc & eliminates branch introdued due to layout specific padding. It leverages from the buffer assumptions and use the information to eliminate the branch. More... | |
TIR specific transformation passes.