Managed reference to ScheduleRuleNode.
More...
#include <schedule_rule.h>
|
static ScheduleRule | AutoInline (bool into_producer, bool into_consumer, bool inline_const_tensor, bool disallow_if_then_else, bool require_injective, bool require_ordered, Optional< Array< String >> disallow_op) |
| Create an auto-inline rule that inlines spatial blocks if it satisfies some conditions. More...
|
|
static ScheduleRule | MultiLevelTiling (String structure, Optional< Array< String >> tile_binds, Optional< Integer > max_innermost_factor, Optional< Array< Integer >> vector_load_lens, Optional< Map< String, ObjectRef >> reuse_read, Optional< Map< String, ObjectRef >> reuse_write) |
| Create a mega rule: multi-level tiling with data reuse. More...
|
|
static ScheduleRule | MultiLevelTilingWithIntrin (String intrin_name, String structure, Optional< Array< String >> tile_binds, Optional< Integer > max_innermost_factor, Optional< Array< Integer >> vector_load_lens, Optional< Map< String, ObjectRef >> reuse_read, Optional< Map< String, ObjectRef >> reuse_write) |
| Extension of MultiLevelTiling for auto-tensorizing with a single intrinsic. More...
|
|
static ScheduleRule | AddRFactor (int max_jobs_per_core, Optional< Integer > max_innermost_factor) |
| Create a rule: add-rfactor to some blocks if needed. More...
|
|
static ScheduleRule | CrossThreadReduction (Array< Integer > thread_extents) |
| Create a schedule rule which applies cross-thread reduction to some reduction blocks correspondingly when needed. More...
|
|
static ScheduleRule | RandomComputeLocation () |
| A rule that randomly select a compute-at location for a free block. More...
|
|
static ScheduleRule | ParallelizeVectorizeUnroll (int max_jobs_per_core, int max_vectorize_extent, Array< Integer > unroll_max_steps, bool unroll_explicit) |
| Mark parallelize, vectorize and unroll to the root block. The mark will be applied to each block in a follow-up post processor. More...
|
|
static ScheduleRule | AutoBind (int max_threadblocks, Array< Integer > thread_extents) |
| Auto bind loops around the block to BlockIdx and ThreadIdx. More...
|
|
static ScheduleRule | PyScheduleRule (PyScheduleRuleNode::FInitializeWithTuneContext f_initialize_with_tune_context, PyScheduleRuleNode::FApply f_apply, PyScheduleRuleNode::FAsString f_as_string) |
| Create a schedule rule with customized methods on the python-side. More...
|
|
◆ AddRFactor()
Create a rule: add-rfactor to some blocks if needed.
- Parameters
-
max_jobs_per_core | The maximum number of jobs to be launched per CPU core. It sets the uplimit of CPU parallelism, i.e. num_cores * max_jobs_per_core . Use -1 to disable parallelism. |
max_innermost_factor | The maximum size of the innermost factor. NullOpt means no limit |
- Returns
- The schedule rule created
◆ AutoBind()
static ScheduleRule tvm::meta_schedule::ScheduleRule::AutoBind |
( |
int |
max_threadblocks, |
|
|
Array< Integer > |
thread_extents |
|
) |
| |
|
static |
Auto bind loops around the block to BlockIdx and ThreadIdx.
- Parameters
-
max_threadblocks | The maximum number of threadblock on GPU |
thread_extents | Candidates of thread axis extent. |
- Returns
- The schedule rule created
◆ AutoInline()
static ScheduleRule tvm::meta_schedule::ScheduleRule::AutoInline |
( |
bool |
into_producer, |
|
|
bool |
into_consumer, |
|
|
bool |
inline_const_tensor, |
|
|
bool |
disallow_if_then_else, |
|
|
bool |
require_injective, |
|
|
bool |
require_ordered, |
|
|
Optional< Array< String >> |
disallow_op |
|
) |
| |
|
static |
Create an auto-inline rule that inlines spatial blocks if it satisfies some conditions.
- Parameters
-
into_producer | If allows to inline a block into its producer |
into_consumer | If allows to inline a block into its consumer |
inline_const_tensor | Always inline constant tensors |
disallow_if_then_else | Always disallow if-then-else-like constructs |
require_ordered | Always require the read-to-write mapping to be ordered |
require_injective | Always require the read-to-write mapping to be injective |
disallow_op | The operators that are disallowed in auto inline |
- Returns
- The schedule rule created
◆ CrossThreadReduction()
Create a schedule rule which applies cross-thread reduction to some reduction blocks correspondingly when needed.
- Parameters
-
thread_extents | Candidates of thread axis extent (values are required to be positive). |
- Returns
- The schedule rule created
◆ MultiLevelTiling()
Create a mega rule: multi-level tiling with data reuse.
- Parameters
-
structure | The tiling structure. Recommended:
- 'SSRSRS' on CPU
- 'SSSRRSRS' on GPU
|
tile_binds | For each level of tiles, which thread axis it is bound to. Recommended:
- NullOpt on CPU
- [blockIdx.x, vthread.x, threadIdx.x] on GPU
|
max_innermost_factor | The maximum size of the innermost factor. NullOpt means no limit |
vector_load_lens | The length of vector lane in vectorized cooperative fetching. NullOpt means disable vectorization |
reuse_read | Data reuse configuration for reading. NullOpt means no reuse. |
reuse_write | Data reuse configuration for writing. NullOpt means no reuse. |
- Returns
- The schedule rule created
◆ MultiLevelTilingWithIntrin()
Extension of MultiLevelTiling for auto-tensorizing with a single intrinsic.
- Parameters
-
intrin_name | The name of a tensor intrinsic, must be registerd via TensorIntrin.register(...) beforehand |
structure | The tiling structure. Recommended:
- 'SSRSRS' on CPU
- 'SSSRRSRS' on GPU
|
tile_binds | For each level of tiles, which thread axis it is bound to. Recommended:
- NullOpt on CPU
- [blockIdx.x, vthread.x, threadIdx.x] on GPU
|
max_innermost_factor | The maximum size of the innermost factor. NullOpt means no limit |
vector_load_lens | The length of vector lane in vectorized cooperative fetching. NullOpt means disable vectorization |
reuse_read | Data reuse configuration for reading. NullOpt means no reuse. |
reuse_write | Data reuse configuration for writing. NullOpt means no reuse. |
- Returns
- The schedule rule created
◆ ParallelizeVectorizeUnroll()
static ScheduleRule tvm::meta_schedule::ScheduleRule::ParallelizeVectorizeUnroll |
( |
int |
max_jobs_per_core, |
|
|
int |
max_vectorize_extent, |
|
|
Array< Integer > |
unroll_max_steps, |
|
|
bool |
unroll_explicit |
|
) |
| |
|
static |
Mark parallelize, vectorize and unroll to the root block. The mark will be applied to each block in a follow-up post processor.
- Parameters
-
max_jobs_per_core | The maximum number of jobs to be launched per CPU core. It sets the upper limit of CPU parallelism, i.e. num_cores * max_jobs_per_core . Use -1 to disable parallelism. |
max_vectorize_extent | The maximum extent to be vectorized. It sets the upper limit of the hardware target vectorization. Use -1 to disable vectorization. |
unroll_max_steps | The options of the maximum number of unroll steps to be done. Use an empty array to disable unroll. |
unroll_explicit | Whether to explicitly unroll the loop, or just add an "unroll" pragma. |
- Returns
- The schedule rule created
◆ PyScheduleRule()
Create a schedule rule with customized methods on the python-side.
- Parameters
-
f_initialize_with_tune_context | The packed function of InitializeWithTuneContext . |
f_apply | The packed function of Apply . |
f_as_string | The packed function of AsString . |
- Returns
- The schedule rule created.
◆ RandomComputeLocation()
static ScheduleRule tvm::meta_schedule::ScheduleRule::RandomComputeLocation |
( |
| ) |
|
|
static |
A rule that randomly select a compute-at location for a free block.
- Returns
- The schedule rule created
◆ TVM_DEFINE_MUTABLE_OBJECT_REF_METHODS()
The documentation for this class was generated from the following file: