tvm
|
Managed reference to ScheduleRuleNode. More...
#include <schedule_rule.h>
Public Types | |
using | FInitializeWithTuneContext = runtime::TypedPackedFunc< void(const TuneContext &)> |
The function type of InitializeWithTuneContext method. More... | |
using | FApply = runtime::TypedPackedFunc< Array< tir::Schedule >(const tir::Schedule &, const tir::BlockRV &)> |
The function type of Apply method. More... | |
using | FAsString = runtime::TypedPackedFunc< String()> |
Get the schedule rule as string with name. More... | |
using | FClone = runtime::TypedPackedFunc< ScheduleRule()> |
The function type of Clone method. More... | |
Public Types inherited from tvm::runtime::ObjectRef | |
using | ContainerType = Object |
type indicate the container type. More... | |
Public Member Functions | |
TVM_DEFINE_MUTABLE_OBJECT_REF_METHODS (ScheduleRule, ObjectRef, ScheduleRuleNode) | |
Public Member Functions inherited from tvm::runtime::ObjectRef | |
ObjectRef ()=default | |
default constructor More... | |
ObjectRef (ObjectPtr< Object > data) | |
Constructor from existing object ptr. More... | |
bool | same_as (const ObjectRef &other) const |
Comparator. More... | |
bool | operator== (const ObjectRef &other) const |
Comparator. More... | |
bool | operator!= (const ObjectRef &other) const |
Comparator. More... | |
bool | operator< (const ObjectRef &other) const |
Comparator. More... | |
bool | defined () const |
const Object * | get () const |
const Object * | operator-> () const |
bool | unique () const |
int | use_count () const |
template<typename ObjectType , typename = std::enable_if_t<std::is_base_of_v<Object, ObjectType>>> | |
const ObjectType * | as () const |
Try to downcast the internal Object to a raw pointer of a corresponding type. More... | |
template<typename ObjectRefType , typename = std::enable_if_t<std::is_base_of_v<ObjectRef, ObjectRefType>>> | |
Optional< ObjectRefType > | as () const |
Try to downcast the ObjectRef to a Optional<T> of the requested type. More... | |
Static Public Member Functions | |
static ScheduleRule | ApplyCustomRule () |
Create a rule that applies customized rules registered using block attribute schedule_rule . The rule will be dispatched according to target keys. More... | |
static bool | IsApplyCustomRule (const ScheduleRule &rule) |
Check if the rule is ApplyCustomRule More... | |
static ScheduleRule | AutoInline (bool into_producer, bool into_consumer, bool inline_const_tensor, bool disallow_if_then_else, bool require_injective, bool require_ordered, Optional< Array< String >> disallow_op) |
Create an auto-inline rule that inlines spatial blocks if it satisfies some conditions. More... | |
static ScheduleRule | InlineConstantScalars () |
Inline blocks that produce a constant scalar. Such blocks get in the way of ReverseComputeInline during AutoInline, since they are also counted as a producer block unless they are inlined first. So it is recommended to run InlineConstantScalars before AutoInline. More... | |
static ScheduleRule | MultiLevelTiling (String structure, Optional< Array< String >> tile_binds, Optional< Integer > max_innermost_factor, Optional< Array< Integer >> vector_load_lens, Optional< Map< String, ObjectRef >> reuse_read, Optional< Map< String, ObjectRef >> reuse_write, Optional< runtime::PackedFunc > filter_fn=NullOpt) |
Create a mega rule: multi-level tiling with data reuse. More... | |
static ScheduleRule | MultiLevelTilingWithIntrin (String intrin_name, String structure, Optional< Array< String >> tile_binds, Optional< Integer > max_innermost_factor, Optional< Array< Integer >> vector_load_lens, Optional< Map< String, ObjectRef >> reuse_read, Optional< Map< String, ObjectRef >> reuse_write) |
Extension of MultiLevelTiling for auto-tensorization with a single intrinsic. More... | |
static ScheduleRule | MultiLevelTilingTensorCore (Array< Map< String, String >> intrin_groups, String structure, Optional< Array< String >> tile_binds, Optional< Integer > max_innermost_factor, Optional< Array< Integer >> vector_load_lens, Optional< Map< String, ObjectRef >> reuse_read, Optional< Map< String, ObjectRef >> reuse_write, bool use_software_pipeline) |
Extension of MultiLevelTiling for auto-tensorization with multiple groups of candidate tensor core intrinsics. More... | |
static ScheduleRule | MultiLevelTilingWideVector (String structure, Integer vector_length_in_bits, Optional< Integer > max_innermost_factor, Optional< Map< String, ObjectRef >> reuse_read, Optional< Map< String, ObjectRef >> reuse_write) |
Extension of MultiLevelTiling for backends with wide vectors. The loop over the innermost spatial axis of the output buffer is always vectorized with the maximum vector length. More... | |
static ScheduleRule | AddRFactor (int max_jobs_per_core, Optional< Integer > max_innermost_factor) |
Create a rule: add-rfactor to some blocks if needed. More... | |
static ScheduleRule | CrossThreadReduction (Array< runtime::Int > thread_extents) |
Create a schedule rule which applies cross-thread reduction to some reduction blocks correspondingly when needed. More... | |
static ScheduleRule | RandomComputeLocation () |
A rule that randomly select a compute-at location for a free block. More... | |
static ScheduleRule | ParallelizeVectorizeUnroll (int max_jobs_per_core, int max_vectorize_extent, Array< runtime::Int > unroll_max_steps, bool unroll_explicit) |
Mark parallelize, vectorize and unroll to the root block. The mark will be applied to each block in a follow-up post processor. More... | |
static ScheduleRule | AutoBind (int max_threadblocks, Array< Integer > thread_extents, int max_threads_per_block=-1) |
Auto bind loops around the block to BlockIdx and ThreadIdx. More... | |
static ScheduleRule | PyScheduleRule (FInitializeWithTuneContext f_initialize_with_tune_context, FApply f_apply, FClone f_clone, FAsString f_as_string) |
Create a schedule rule with customized methods on the python-side. More... | |
static Array< ScheduleRule, void > | DefaultLLVM () |
Create default schedule rules for LLVM. More... | |
static Array< ScheduleRule, void > | DefaultX86 (const String &type) |
Create default schedule rules for x86 (AVX512 and VNNI) More... | |
static Array< ScheduleRule, void > | DefaultCUDA () |
Create default schedule rules for CUDA. More... | |
static Array< ScheduleRule, void > | DefaultCUDATensorCore () |
Create default postprocessors for CUDA with TensorCore. More... | |
static Array< ScheduleRule, void > | DefaultHexagon () |
Create default schedule rules for Hexagon. More... | |
static Array< ScheduleRule, void > | DefaultARM (const String &type) |
Create default schedule rules for ARM CPU (NEON and DOTPROD) More... | |
Additional Inherited Members | |
Static Public Attributes inherited from tvm::runtime::ObjectRef | |
static constexpr bool | _type_is_nullable = true |
Protected Member Functions inherited from tvm::runtime::ObjectRef | |
Object * | get_mutable () const |
Static Protected Member Functions inherited from tvm::runtime::ObjectRef | |
template<typename T > | |
static T | DowncastNoCheck (ObjectRef ref) |
Internal helper function downcast a ref without check. More... | |
static void | FFIClearAfterMove (ObjectRef *ref) |
Clear the object ref data field without DecRef after we successfully moved the field. More... | |
template<typename ObjectType > | |
static ObjectPtr< ObjectType > | GetDataPtr (const ObjectRef &ref) |
Internal helper function get data_ as ObjectPtr of ObjectType. More... | |
Protected Attributes inherited from tvm::runtime::ObjectRef | |
ObjectPtr< Object > | data_ |
Internal pointer that backs the reference. More... | |
Managed reference to ScheduleRuleNode.
using tvm::meta_schedule::ScheduleRule::FApply = runtime::TypedPackedFunc<Array<tir::Schedule>(const tir::Schedule&, const tir::BlockRV&)> |
The function type of Apply
method.
sch | The schedule to be modified. |
block | The specific block to apply the schedule rule. |
Get the schedule rule as string with name.
The function type of Clone
method.
using tvm::meta_schedule::ScheduleRule::FInitializeWithTuneContext = runtime::TypedPackedFunc<void(const TuneContext&)> |
The function type of InitializeWithTuneContext
method.
context | The tuning context for initialization. |
|
static |
Create a rule: add-rfactor to some blocks if needed.
max_jobs_per_core | The maximum number of jobs to be launched per CPU core. It sets the uplimit of CPU parallelism, i.e. num_cores * max_jobs_per_core . Use -1 to disable parallelism. |
max_innermost_factor | The maximum size of the innermost factor. NullOpt means no limit |
|
static |
Create a rule that applies customized rules registered using block attribute schedule_rule
. The rule will be dispatched according to target keys.
|
static |
Auto bind loops around the block to BlockIdx and ThreadIdx.
max_threadblocks | The maximum number of threadblock on GPU |
thread_extents | Candidates of thread axis extent. |
max_threads_per_block | The maximum number of threads per block, if it is known when this schedule rule is created. |
|
static |
Create an auto-inline rule that inlines spatial blocks if it satisfies some conditions.
into_producer | If allows to inline a block into its producer |
into_consumer | If allows to inline a block into its consumer |
inline_const_tensor | Always inline constant tensors |
disallow_if_then_else | Always disallow if-then-else-like constructs |
require_ordered | Always require the read-to-write mapping to be ordered |
require_injective | Always require the read-to-write mapping to be injective |
disallow_op | The operators that are disallowed in auto inline |
|
static |
Create a schedule rule which applies cross-thread reduction to some reduction blocks correspondingly when needed.
thread_extents | Candidates of thread axis extent (values are required to be positive). |
|
static |
Create default schedule rules for ARM CPU (NEON and DOTPROD)
|
static |
Create default schedule rules for CUDA.
|
static |
Create default postprocessors for CUDA with TensorCore.
|
static |
Create default schedule rules for Hexagon.
|
static |
Create default schedule rules for LLVM.
|
static |
Create default schedule rules for x86 (AVX512 and VNNI)
|
static |
Inline blocks that produce a constant scalar. Such blocks get in the way of ReverseComputeInline during AutoInline, since they are also counted as a producer block unless they are inlined first. So it is recommended to run InlineConstantScalars before AutoInline.
|
static |
Check if the rule is ApplyCustomRule
|
static |
Create a mega rule: multi-level tiling with data reuse.
structure | The tiling structure. Recommended:
|
tile_binds | For each level of tiles, which thread axis it is bound to. Recommended:
|
max_innermost_factor | The maximum size of the innermost factor. NullOpt means no limit |
vector_load_lens | The length of vector lane in vectorized cooperative fetching. NullOpt means disable vectorization |
reuse_read | Data reuse configuration for reading. NullOpt means no reuse. |
reuse_write | Data reuse configuration for writing. NullOpt means no reuse. |
filter_fn | A function that can be passed to overwrite the default condition for applying MultiLevelTiling to a block. Its signature must be (Schedule, BlockRV) -> bool. This is useful if there is a need to apply MultiLevelTiling to an operation / block which is ignored by default. This function should return True for a block that should be tiled. |
|
static |
Extension of MultiLevelTiling for auto-tensorization with multiple groups of candidate tensor core intrinsics.
intrin_groups | A list of groups of tensor core intrinsics. The map should contains key "init", "load_a", "load_b", "compute", "store", which represent the tensor intrin for initialization, loading operand A, loading operand B, tensor core computation, storing the result. The value of the map should be names of tensor intrinsics, must be registered via TensorIntrin.register(...) beforehand |
structure | The tiling structure. Recommended:
|
tile_binds | For each level of tiles, which thread axis it is bound to. Recommended:
|
max_innermost_factor | The maximum size of the innermost factor. NullOpt means no limit |
vector_load_lens | The length of vector lane in vectorized cooperative fetching. NullOpt means disable vectorization |
reuse_read | Data reuse configuration for reading. NullOpt means no reuse. |
reuse_write | Data reuse configuration for writing. NullOpt means no reuse. |
use_software_pipeline | Whether use the software pipeline. |
|
static |
Extension of MultiLevelTiling for backends with wide vectors. The loop over the innermost spatial axis of the output buffer is always vectorized with the maximum vector length.
structure | The tiling structure. 'SSRSRS' is recommended. |
vector_length_in_bits | The length of a vector register in bits. |
max_innermost_factor | The maximum size of the innermost factor. NullOpt means no limit |
reuse_read | Data reuse configuration for reading. NullOpt means no reuse. |
reuse_write | Data reuse configuration for writing. NullOpt means no reuse. |
|
static |
Extension of MultiLevelTiling for auto-tensorization with a single intrinsic.
intrin_name | The name of a tensor intrinsic, must be registered via TensorIntrin.register(...) beforehand |
structure | The tiling structure. Recommended:
|
tile_binds | For each level of tiles, which thread axis it is bound to. Recommended:
|
max_innermost_factor | The maximum size of the innermost factor. NullOpt means no limit |
vector_load_lens | The length of vector lane in vectorized cooperative fetching. NullOpt means disable vectorization |
reuse_read | Data reuse configuration for reading. NullOpt means no reuse. |
reuse_write | Data reuse configuration for writing. NullOpt means no reuse. |
|
static |
Mark parallelize, vectorize and unroll to the root block. The mark will be applied to each block in a follow-up post processor.
max_jobs_per_core | The maximum number of jobs to be launched per CPU core. It sets the upper limit of CPU parallelism, i.e. num_cores * max_jobs_per_core . Use -1 to disable parallelism. |
max_vectorize_extent | The maximum extent to be vectorized. It sets the upper limit of the hardware target vectorization. Use -1 to disable vectorization. |
unroll_max_steps | The options of the maximum number of unroll steps to be done. Use an empty array to disable unroll. |
unroll_explicit | Whether to explicitly unroll the loop, or just add an "unroll" pragma. |
|
static |
Create a schedule rule with customized methods on the python-side.
f_initialize_with_tune_context | The packed function of InitializeWithTuneContext . |
f_apply | The packed function of Apply . |
f_clone | The packed function of Clone . |
f_as_string | The packed function of AsString . |
|
static |
A rule that randomly select a compute-at location for a free block.
tvm::meta_schedule::ScheduleRule::TVM_DEFINE_MUTABLE_OBJECT_REF_METHODS | ( | ScheduleRule | , |
ObjectRef | , | ||
ScheduleRuleNode | |||
) |