tvm.s_tir.transform

Namespace of all S-TIR transformations

class tvm.s_tir.transform.HoistedConditionals(value)

Flags for use in HoistExpressionConfig.conditional_types

Each bitflag represents a type of expression that should be hoisted to the outermost loop possible.

Never = 0: No hoisting of conditionals

IfElseStmt = 1: If set, look for hoist candidates in IfElseStmt

IfElseExpr = 2: If set, look for hoist candidates in tirx.if_then_else

BooleanExpression = 4: If set, look for hoist candidates in all boolean expressions

UsingBlockVar = 8: If set, allow hoisting of conditionals that use a block variable (e.g. threadIdx.x)

All = 15: Enable all hoisting of conditionals

class tvm.s_tir.transform.HoistedLetBindings(value)

Flags for use in HoistExpressionConfig.let_binding_types

Each bitflag represents a type of let binding expression that should be hoisted to the outermost loop possible.

Never = 0: No hoisting of let bindings

RequiredByConditional = 1: Bindings that are used by a hoisted conditional

Bind = 2: Bindings occurring in Bind nodes

LetExpr = 4: Bindings occurring in Let expressions

All = 7: Enable all hoisting of let bindings

tvm.s_tir.transform.AnnotateIrregularLoop()

Annotate irregular loop mark.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.CanonicalizeLoop()

Canonicalize the loop to start from zero and use trivial step

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.CompactBufferAllocation(is_strict: bool = True)

Compact the buffer access region by removing the buffer regions that are not accessed, i.e. narrowing the buffer shape and adjust the access region if necessary.

Parameters:: is_strict (bool) – Ensure the compacted shape to be always smaller than the original shape. Otherwise it allows to grow the shape to match actual accessed buffer regions.
Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.ConvertBlocksToOpaque()

Substitute all the block vars with the PrimExprs they are bound to, indicated by the corresponding iter_values in BlockRealize, and then convert the blocks into opaque ones by removing all the iter_values in BlockRealize and iter_vars in Block.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.DecorateDeviceScope()

Decorate all the function’s body as device function.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.DefaultGPUSchedule()

Set default thread bindings for GPU PrimFuncs.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.HoistExpression()

Hoist loop-invariant expressions to outside the eligible loops.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.HoistIfThenElse(variant=None)

Hoist loop-invariant IfThenElse nodes to outside the eligible loops.

Parameters:: variant (Optional[String]) – The variant of the pass. variant can have any one of following values [“basic”, None(Default)].
Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.InferFragment()

Infer the TensorCore fragment information using tensor intrinsics.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.InjectDoubleBuffer()

Inject double buffer statements.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

class tvm.s_tir.transform.InjectDoubleBufferConfig(split_loop=<MISSING>)

Config for inject double buffer pass

property split_loop: Split loop factors

tvm.s_tir.transform.InjectPTXAsyncCopy()

Rewrite global to shared memory copy on CUDA with asynchronous copy.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.InjectPTXLDG32(enable_inject_ptx_intrin=True)

Inject ptx.ldg.32 intrinsics.

Parameters:: enable_inject_ptx_intrin (bool) – If True, inject ptx.ldg.32 intrinsics.

tvm.s_tir.transform.InjectPermutedLayout()

Inject permuted layout in mma

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.InjectSoftwarePipeline()

Transform annotated loops into pipelined one that parallelize producers and consumers

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.InjectVirtualThread()

Inject virtual thread loops.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.InstrumentBoundCheckers()

Instruments bound checkers.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.InstrumentProfileIntrinsics()

Insert intrinsic calls to instrument function and loop level profiling.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.LiftThreadBinding()

Lift the same thread bindings to their LCA loops.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.LoopPartition()

Partition loops in the stmt.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

class tvm.s_tir.transform.LoopPartitionConfig(partition_const_loop=<MISSING>, no_unroll_loop_with_extent_one=<MISSING>, unroll_loop_with_partition_hint_no_interval=<MISSING>)

Config for loop partition pass

property no_unroll_loop_with_extent_one: Don’t unroll loops with extent 1

property partition_const_loop: Split constant loop

property unroll_loop_with_partition_hint_no_interval: Unroll loops with pragma_loop_partition_hint and no interval

tvm.s_tir.transform.LowerAsyncDMA()

Lower async DMA to DMA.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.LowerAutoCopy()

Automatically do memory optimizations for auto copy blocks

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.LowerCrossThreadReduction()

Lower cross-thread reduction from thread bindings to intrinsic function calls.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.LowerInitBlock()

Lower block init stmt into IfThenElse statements.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.LowerMatchBuffer()

Remove match buffers inside the block. Also, it will validate the binding.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.LowerOpaqueBlock()

Remove the block to ensure that the TIR can not be scheduled again.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.LowerThreadAllreduce()

Lower cross thread allreduce.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.LowerVtcmAlloc()

Lower vtcm allocation.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.ManifestSharedMemoryLocalStage()

Add the explicit local stage for the shared memory access on GPU.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.MergeSharedMemoryAllocations()

This pass merges multiple TIR-level shared memory allocations into one allocation.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.PlanAndUpdateBufferAllocationLocation()

Locate the buffer allocation to the exact position (usually is the lca of buffer access). This pass will inject opaque block with alloc_buffers at the allocation site.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.RemoveStoreUndef()

Remove stores of undefined values from the Stmt.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.RemoveWeightLayoutRewriteBlock(skip_tensor_rewrite=False)

Remove weight layout rewrite block before benchmarking during tuning stage.

Parameters:: skip_tensor_rewrite (bool) – If True, exact rewrite of Tensor, according to the given index map, will be skipped.
Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.RenormalizeSplitPattern()

Renormalize the split pattern from floordiv(floormod()) to floormod(floordiv())

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.RewriteUnsafeSelect()

Detect and rewrite unsafe select that contains memory access.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.ThreadSync(storage_scope)

Insert sync between parallel read/write of shared buffers.

Parameters:: storage_scope (str) – The target storage scope.
Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.TransformMmaBufferLayout()

Transform mma buffer layout

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.UnifyThreadBinding()

Unify all the thread bindings for “blockIdx.x/y/z”, “threadIdx.x/y/z”, and “vthread.x/y/z”.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.UseAssumeToReduceBranches()

Eliminate layout specific pad branch by overcomputing values for padded region.

Returns:: fpass – The result pass
Return type:: tvm.transform.Pass

tvm.s_tir.transform.VerifyVTCMLimit(default_target=None)

Verify if the size of the allocated vtcm memory satisfies the limit.

The limit is determined from the “vtcm-capacity” attribute of the target.

Parameters:: default_target (Optional[tvm.target.Target]) – The default target to use if a PrimFunc does not have a target attribute.
Returns:: fpass – The result pass
Return type:: tvm.transform.Pass