tvm
|
Typedefs | |
using | Pass = tvm::transform::Pass |
using | PassNode = tvm::transform::PassNode |
using | PassInfo = tvm::transform::PassInfo |
using | PassInfoNode = tvm::transform::PassInfoNode |
using | PassContext = tvm::transform::PassContext |
using | PassContextNode = tvm::transform::PassContextNode |
using | Sequential = tvm::transform::Sequential |
using | FTVMRelayToTIR = tvm::transform::Pass |
RelayToTIR tvm::transform::Pass specific to a TargetKind. More... | |
using | FTVMTIRToRuntime = tvm::runtime::TypedPackedFunc< runtime::Module(IRModule, Target)> |
TIRToRuntime conversion specific to a TargetKind. More... | |
Functions | |
Pass | CreateFunctionPass (const runtime::TypedPackedFunc< Function(Function, IRModule, PassContext)> &pass_func, int opt_level, String name, tvm::Array< String > required, bool traceable=false) |
Pass | DeadCodeElimination (bool inline_once=false, bool ignore_purity=false) |
Remove let-bound expressions which do not effect the program result. More... | |
Pass | LazyGradientInit () |
Convert all expressions of TensorType into GradCell, an algebraic data type defined in gradient.rly. More... | |
Pass | FoldConstant (bool fold_qnn=false) |
Fold constant expressions. More... | |
Pass | SplitArgs (uint64_t max_function_args) |
Split function with huge number of arguments to smaller pieces. More... | |
Pass | FuseOps (int fuse_opt_level=-1) |
Fuse operations into expr into separate functions. More... | |
Pass | DefuseOps () |
The inverse operation of FuseOps. It transforms a fused program returned by FuseOps into the program before FuseOps. (i.e. x == DefuseOps(FuseOps(x))) More... | |
Pass | RewriteAnnotatedOps (int fallback_device) |
Rewrite the annotated program. More... | |
Pass | ToBasicBlockNormalForm () |
Turn an expression to Basic Block Normal Form. More... | |
Pass | ToANormalForm () |
turn a dataflow graph into Administrative Normal Form, or A-Normal Form (ANF). More... | |
Expr | ToANormalForm (const Expr &expr) |
ToANormalForm but on incomplete graph. More... | |
Pass | ToCPS () |
Turn an expression into continuation passing style(CPS). More... | |
Pass | ToGraphNormalForm () |
Remove let binding and directly share via pointer instead. More... | |
Pass | PartialEval () |
Aggressive constant propagation/constant folding/inlining. More... | |
Pass | SimplifyInference () |
Simplify certain operators during inference. For example, the result of a batch norm which is indexed at tuple index 0 will be unpacked into a number of simplified operators. More... | |
Pass | FastMath () |
Replaces non linear activation functions with their fast but approximate counterparts. More... | |
Pass | DynamicToStatic () |
Find Dynamic ops and make them static. More... | |
Pass | InferType () |
Infer the type of an expression. More... | |
Type | InferTypeLocal (const Expr &expr) |
Infer the type of an expression, reusing existing type information. More... | |
Pass | EliminateCommonSubexpr (runtime::PackedFunc fskip=nullptr) |
Search and eliminate common subexpression. For example, if there are two expressions evaluated to an identical value, a single variable is created and these two expressions are replaced by this variable. More... | |
Pass | CombineParallelConv2D (uint64_t min_num_branches=3) |
Combine parallel 2d convolutions into a single convolution if the number of branches of this conv2d operator is not less than min_num_branch . More... | |
Pass | CombineParallelDense (uint64_t min_num_branches=3, bool to_batch_matmul=true) |
Combine parallel dense ops into a single batch_matmul if the number of branches of this dense operator is not less than min_num_branch . More... | |
Pass | CombineParallelBatchMatmul (uint64_t min_num_branches=3) |
Combine parallel batch_matmul ops into a single batch_matmul if the number of branches of this dense operator is not less than min_num_branch . More... | |
Pass | BackwardFoldScaleAxis () |
Backward fold axis scaling into weights of conv/dense operators. More... | |
Pass | ForwardFoldScaleAxis () |
Forward fold axis scaling into weights of conv/dense operators. More... | |
Pass | FoldScaleAxis () |
A sequential pass that executes ForwardFoldScaleAxis and BackwardFoldScaleAxis passes. More... | |
Pass | CanonicalizeOps () |
Canonicalize some operators to the simplified operators. For example, bias_add can be canonicalized to expand_dims and broadcast_add. More... | |
Pass | AlterOpLayout () |
Alternate the layouts of operators or replace primitive operators with other expressions. More... | |
Pass | AutoSchedulerLayoutRewrite () |
Do layout rewrite according to the tile structure created by auto-scheduler. More... | |
Pass | MetaScheduleLayoutRewrite () |
Do layout rewrite according to the tile structure created by meta-schedule. More... | |
Pass | ConvertLayout (const Map< String, Array< String >> &desired_layouts) |
Given a dest layout, this pass transforms the expr such that most of the ops input data layout is changed to the dest layout. In ideal situation, there are only 2 layout transforms, one at the start and one at the end. More... | |
Pass | Legalize (const String &legalize_map_attr_name="FTVMLegalize") |
Legalizes an expr with another expression. More... | |
Pass | CanonicalizeCast () |
Canonicalize cast expressions to make operator fusion more efficient. More... | |
Pass | EtaExpand (bool expand_constructor, bool expand_global_var) |
Add abstraction over a constructor or global variable bound to a function. More... | |
Pass | PartitionGraph () |
Partition a Relay program into regions that can be executed on different backends. More... | |
Pass | Inline () |
Inline the global functions marked as inline in a given Relay IRModule. More... | |
Pass | RemoveUnusedFunctions (Array< runtime::String > entry_functions) |
Remove the unused functions in the Relay IRModule. More... | |
Pass | SimplifyExpr () |
Simplify the Relay expression. More... | |
Pass | SimplifyExprPostAlterOp () |
Stripped down version of SimplifyExpr which is run after AlterOpLayout. More... | |
Pass | RelayToTIRTargetHook (CompilationConfig config) |
Run any custom passes registered under "RelayToTIR" attributes on TargetKinds. More... | |
Pass | ManifestAlloc (VirtualDevice cpu_virtual_device) |
A pass for manifesting explicit memory allocations and rewriting specific dialects. More... | |
Pass | ManifestLifetimes () |
A pass for manifesting variable lifetimes by inserting kill operations when variables become dead. This pass should be run after ManifestAlloc, and should not be run more than once. More... | |
Pass | PlanDevices (CompilationConfig config) |
Uses existing "on_device" and "device_copy" CallNodes to infer the VirtualDevice on which every Relay sub-expression should run and the result stored. Captures the result of that analysis using new "on_device" and "device_copy" CallNodes. More... | |
Pass | FlattenAtrousConv () |
This transform flattens atrous convolution, which corresponds to the sequence of operations: "space_to_batch_nd"->"conv2d"->"batch_to_space_nd" and convert them into subgraphs with a convolution with the modified "dilation" and recalculated "padding" parameters. More... | |
Pass | AnnotateUsedMemory () |
Annotates the minimum required memory of each primitive function callsite by analyzing the liveness of the input/output tensors at each function callsite and calculating the total amount of memory these tensors require. This is added as a "used_memory" annotation to the function in question as a list of the number of bytes for each callsite. In addition, the containing function is annotated with an "io_used_memory" annotation which refers to the total memory required for the IO tensors. More... | |
Pass | CapturePostDfsIndexInSpans () |
Captures the post-dfs index and dominator post-dfs index of (most) expression nodes in their span, in the form "index:<post-dfs index>:<dominator post-dfs index>". This is useful for debugging since a) it helps identify pretty-printed sub-expressions within the overall model and b) the indexes are heavily used by Collage for its compact representation of sub-graphs. More... | |
Pass | AnnotateMemoryScope () |
Calls device dependent memory scope analysis pass, collects mapping of desirable expr->memory_scope and annotates expressions by VirtualDevice with required memory_scope. More... | |
Pass | RemoveStandaloneReshapes () |
Removes non-fused reshapes after lowering the graph. InferType() cannot be invoked after calling this pass as it removes reshapes from the call graph. Many targets only need buffer addresses irrespective of the shapes of them. This makes reshapes symbolic once the graph has been lowered. Reshape removal results into smaller code size and reduced buffer allocations. It opens up opportunities of operator fusion in the target backend. Thus, consequently, it improves the performance of the inference. More... | |
RelayToTIR tvm::transform::Pass specific to a TargetKind.
Called before the default lowering passes.
mod | The module that an optimization pass runs on. |
pass_ctx | The pass context that can provide information for the optimization. |
using tvm::relay::transform::FTVMTIRToRuntime = typedef tvm::runtime::TypedPackedFunc<runtime::Module(IRModule, Target)> |
TIRToRuntime conversion specific to a TargetKind.
This function is responsible for scanning an IRModule for appropriate Target-specific functions and generating a Runtime module representing the compiled output
using tvm::relay::transform::Pass = typedef tvm::transform::Pass |
using tvm::relay::transform::PassContext = typedef tvm::transform::PassContext |
using tvm::relay::transform::PassInfo = typedef tvm::transform::PassInfo |
using tvm::relay::transform::PassInfoNode = typedef tvm::transform::PassInfoNode |
using tvm::relay::transform::PassNode = typedef tvm::transform::PassNode |
using tvm::relay::transform::Sequential = typedef tvm::transform::Sequential |
Pass tvm::relay::transform::AlterOpLayout | ( | ) |
Alternate the layouts of operators or replace primitive operators with other expressions.
Pass tvm::relay::transform::AnnotateMemoryScope | ( | ) |
Calls device dependent memory scope analysis pass, collects mapping of desirable expr->memory_scope and annotates expressions by VirtualDevice with required memory_scope.
Pass tvm::relay::transform::AnnotateUsedMemory | ( | ) |
Annotates the minimum required memory of each primitive function callsite by analyzing the liveness of the input/output tensors at each function callsite and calculating the total amount of memory these tensors require. This is added as a "used_memory" annotation to the function in question as a list of the number of bytes for each callsite. In addition, the containing function is annotated with an "io_used_memory" annotation which refers to the total memory required for the IO tensors.
Note: This pass does not support dynamic shapes, it is the users responsibility to check this pass isn't applied where dynamic shapes may be input.
Pass tvm::relay::transform::AutoSchedulerLayoutRewrite | ( | ) |
Do layout rewrite according to the tile structure created by auto-scheduler.
Pass tvm::relay::transform::BackwardFoldScaleAxis | ( | ) |
Backward fold axis scaling into weights of conv/dense operators.
Pass tvm::relay::transform::CanonicalizeCast | ( | ) |
Canonicalize cast expressions to make operator fusion more efficient.
Pass tvm::relay::transform::CanonicalizeOps | ( | ) |
Canonicalize some operators to the simplified operators. For example, bias_add can be canonicalized to expand_dims and broadcast_add.
Pass tvm::relay::transform::CapturePostDfsIndexInSpans | ( | ) |
Captures the post-dfs index and dominator post-dfs index of (most) expression nodes in their span, in the form "index:<post-dfs index>:<dominator post-dfs index>". This is useful for debugging since a) it helps identify pretty-printed sub-expressions within the overall model and b) the indexes are heavily used by Collage for its compact representation of sub-graphs.
Note that Op and Constructor nodes are not changed even though they are assigned an post-dfs index.
Pass tvm::relay::transform::CombineParallelBatchMatmul | ( | uint64_t | min_num_branches = 3 | ) |
Combine parallel batch_matmul ops into a single batch_matmul if the number of branches of this dense operator is not less than min_num_branch
.
min_num_branches | The minimun number of branches. |
Pass tvm::relay::transform::CombineParallelConv2D | ( | uint64_t | min_num_branches = 3 | ) |
Combine parallel 2d convolutions into a single convolution if the number of branches of this conv2d operator is not less than min_num_branch
.
min_num_branches | The minimun number of branches. |
Pass tvm::relay::transform::CombineParallelDense | ( | uint64_t | min_num_branches = 3 , |
bool | to_batch_matmul = true |
||
) |
Combine parallel dense ops into a single batch_matmul if the number of branches of this dense operator is not less than min_num_branch
.
min_num_branches | The minimun number of branches. |
to_batch_matmul | Whether to combine parallel dense ops to batch matmul. If set false, combine dense ops to single dense op. |
Given a dest layout, this pass transforms the expr such that most of the ops input data layout is changed to the dest layout. In ideal situation, there are only 2 layout transforms, one at the start and one at the end.
This pass is not a part of relay.build and is expected to be called between framework-relay parser and relay.build call. This is very helpful for hardware backends that support/prefer only type of data layout.
RFC - https://discuss.tvm.ai/t/layout-conversion-pass/4009
This pass uses most of the AlterOpLayout and InferCorrectLayout infrastructure. We can define new layouts for conv2d ops for now. Most of the other operators try to adapt to their input layout using the InferCorrectLayout infrastructure.
desired_layouts | Specify mapping of op_name to array of desired layouts for each input. For example: Map("nn.conv2d", Array("NHWC", "OHWI")), this specifies the desired layout for data then kernel for nn.conv2d. |
Pass tvm::relay::transform::CreateFunctionPass | ( | const runtime::TypedPackedFunc< Function(Function, IRModule, PassContext)> & | pass_func, |
int | opt_level, | ||
String | name, | ||
tvm::Array< String > | required, | ||
bool | traceable = false |
||
) |
Pass tvm::relay::transform::DeadCodeElimination | ( | bool | inline_once = false , |
bool | ignore_purity = false |
||
) |
Remove let-bound expressions which do not effect the program result.
This pass will remove let bindings which are not referenced. If inline_once is True, let bindings which are only referenced once will also be inlined.
For example, this pass should turn let a = 1; 2
into 2
, as the value of the expression does not depend on a.
As another example, let a = 1; a
will be optimized into 1 if inline_once is True.
If ignore_purity is False, possibly side-effecting expressions (such as memory allocation, random number generation, reading/writing references, or calls to primitive or external functions) are never elided or inlined. This is sound, but ignore_purity can be set to True to suppress this check.
The analysis is fairly conservative, for example it assumes all local functions may be called more than once, any functions passed as arguments have side effects, and so on.
inline_once | whether or not to inline bindings used exactly once. |
ignore_purity | whether to ignore whether expressions have side-effects |
Pass tvm::relay::transform::DefuseOps | ( | ) |
The inverse operation of FuseOps. It transforms a fused program returned by FuseOps into the program before FuseOps. (i.e. x == DefuseOps(FuseOps(x)))
Pass tvm::relay::transform::DynamicToStatic | ( | ) |
Find Dynamic ops and make them static.
Searches the graph for dynamic ops. If the dynamic inputs to those ops are constants, it replaces them with static ops and re-performs type inference and constant folding. The pass repeats itself until the graph stops changing or we run too many iterations.
Pass tvm::relay::transform::EliminateCommonSubexpr | ( | runtime::PackedFunc | fskip = nullptr | ) |
Search and eliminate common subexpression. For example, if there are two expressions evaluated to an identical value, a single variable is created and these two expressions are replaced by this variable.
fskip | The callback argument that allows to skip certain expressions. |
Pass tvm::relay::transform::EtaExpand | ( | bool | expand_constructor, |
bool | expand_global_var | ||
) |
Add abstraction over a constructor or global variable bound to a function.
For example: square
is transformed to fn (x: int32) -> int32 { square(x) }
.
See https://en.wikipedia.org/wiki/Lambda_calculus#%CE%B7-conversion for more details.
expand_constructor | Whether to expand constructors. |
expand_global_var | Whether to expand global variables. |
Pass tvm::relay::transform::FastMath | ( | ) |
Replaces non linear activation functions with their fast but approximate counterparts.
Pass tvm::relay::transform::FlattenAtrousConv | ( | ) |
This transform flattens atrous convolution, which corresponds to the sequence of operations: "space_to_batch_nd"->"conv2d"->"batch_to_space_nd" and convert them into subgraphs with a convolution with the modified "dilation" and recalculated "padding" parameters.
Pass tvm::relay::transform::FoldConstant | ( | bool | fold_qnn = false | ) |
Fold constant expressions.
Because of backward compatibility reason it skips QNN primitives from folding by default. There are some transformation passes like FakeQuantizationToInteger, which requires to keep QNN primitives for constant subgraphs. Uncontrolled constant folding of QNN primitives may break applicability of FakeQuantizationToInteger. We suggest to use FoldConstant pass with none default fold_qnn=True value only when all other QNN sensitive passes were already applied.
fold_qnn | Whether to fold constants for QNN operations. |
Pass tvm::relay::transform::FoldScaleAxis | ( | ) |
A sequential pass that executes ForwardFoldScaleAxis and BackwardFoldScaleAxis passes.
Pass tvm::relay::transform::ForwardFoldScaleAxis | ( | ) |
Forward fold axis scaling into weights of conv/dense operators.
Pass tvm::relay::transform::FuseOps | ( | int | fuse_opt_level = -1 | ) |
Fuse operations into expr into separate functions.
fuse_opt_level | Optimization level. If it is -1 it will be inferred from pass context. |
Pass tvm::relay::transform::InferType | ( | ) |
Infer the type of an expression.
The result of type checking is a new expression with unambiguous type information filled in, as well as it's checked type field populated with the result type.
Infer the type of an expression, reusing existing type information.
The result of type checking is a new expression with unambiguous type information filled in for the given node only. The local version can use existing type information populated throughout the expression and assumes this information is correct. The local version also avoids examining large amounts of the graph assuming type information is filled in properly which makes it much faster if we iteratively call type inference.
Pass tvm::relay::transform::Inline | ( | ) |
Inline the global functions marked as inline
in a given Relay IRModule.
Pass tvm::relay::transform::LazyGradientInit | ( | ) |
Convert all expressions of TensorType into GradCell, an algebraic data type defined in gradient.rly.
This will delay or decrease memory usage. All calls to ones, ones_like, zeros, zeros_like will not immediately instantiate a tensor in memory, rather only instantiate if needed. It also defines + and * operation between GradCell types which can increase performance when using zero-filled or one-filled tensors, which is the case in reverse mode ad.
Legalizes an expr with another expression.
legalize_map_attr_name | The Op's attr name which corresponds to the legalize rule function. One can collect and isolate similar type of legalize transformations using this param. For example, transformations that only apply to Dialects can be isolated into a FTVMDialectLegalize string. This pass calls only those transformations that have been registered using the supplied legalize_map_attr_name. |
Pass tvm::relay::transform::ManifestAlloc | ( | VirtualDevice | cpu_virtual_device | ) |
A pass for manifesting explicit memory allocations and rewriting specific dialects.
cpu_virtual_device | VirtualDevice for computations and data which must reside on a CPU, such as shapes and shape functions. |
Pass tvm::relay::transform::ManifestLifetimes | ( | ) |
A pass for manifesting variable lifetimes by inserting kill operations when variables become dead. This pass should be run after ManifestAlloc, and should not be run more than once.
Pass tvm::relay::transform::MetaScheduleLayoutRewrite | ( | ) |
Do layout rewrite according to the tile structure created by meta-schedule.
Pass tvm::relay::transform::PartialEval | ( | ) |
Aggressive constant propagation/constant folding/inlining.
It will do as much computation in compile time as possible. It has two benefit: remove runtime overhead, and allow more optimization (typically fusion). As a side effect, code size will explode.
Pass tvm::relay::transform::PartitionGraph | ( | ) |
Partition a Relay program into regions that can be executed on different backends.
Pass tvm::relay::transform::PlanDevices | ( | CompilationConfig | config | ) |
Uses existing "on_device" and "device_copy" CallNodes to infer the VirtualDevice
on which every Relay sub-expression should run and the result stored. Captures the result of that analysis using new "on_device" and "device_copy" CallNodes.
See tvm::relay::transform::{LexicalOnDeviceMixin,DeviceAwareExprVisitor,DeviceAwareExprMutator} for help recovering the device for an arbitrary sub-expression in downstream transformations.
config | Describes the targets and default VirtualDevice for all primitive operators and host sub-expressions. |
Pass tvm::relay::transform::RelayToTIRTargetHook | ( | CompilationConfig | config | ) |
Run any custom passes registered under "RelayToTIR" attributes on TargetKinds.
This pass looks for inline, let-bound or global functions which have a "Compiler" attribute. If the attribute value corresponds to a TargetKind with a "RelayToTIR" attribute, then the 'custom' pass bound to that attribute is run (at most once) on the IRModule as a whole.
If, in addition, the config
has a Target with a matching TargetKind, that Target is set as the 'current' target before the custom pass is executed. In this way it is possible for custom passes to pick up target options which may guide how they transform the IRModule. (Those targets are referred to as 'extern codegen targets' elsewhere).
A typical custom pass will:
It is also possible (despite the pass and attribute names!) for the custom pass to proceed directly to a runtime::Module, which can be attached to the output IRModules "external_mods" attribute (taking care not to clobber any existing modules). In this case the flow is as above, except:
There are many existing runtime::Modules, ranging from source to object to dynamic libaries to entirely custom implementations. Some of those may require additional compilation using 'export_library' on the final build artifact.
The OutlineCompilerFunctionsWithExistingGlobalSymbols and MarkCompilerFunctionsAsExtern utility passes can be used by custom passes to take care of some of the boilerplate.
TODO(mbs): Rename PreLoweringTargetHooks?
config | All available targets. |
Pass tvm::relay::transform::RemoveStandaloneReshapes | ( | ) |
Removes non-fused reshapes after lowering the graph. InferType() cannot be invoked after calling this pass as it removes reshapes from the call graph. Many targets only need buffer addresses irrespective of the shapes of them. This makes reshapes symbolic once the graph has been lowered. Reshape removal results into smaller code size and reduced buffer allocations. It opens up opportunities of operator fusion in the target backend. Thus, consequently, it improves the performance of the inference.
Pass tvm::relay::transform::RemoveUnusedFunctions | ( | Array< runtime::String > | entry_functions | ) |
Remove the unused functions in the Relay IRModule.
entry_functions | The entry functions used to search the functions that are being used. |
Pass tvm::relay::transform::RewriteAnnotatedOps | ( | int | fallback_device | ) |
Rewrite the annotated program.
fallback_device | The fallback device which is the default device for operators without annotation. |
Pass tvm::relay::transform::SimplifyExpr | ( | ) |
Simplify the Relay expression.
Pass tvm::relay::transform::SimplifyExprPostAlterOp | ( | ) |
Stripped down version of SimplifyExpr which is run after AlterOpLayout.
Pass tvm::relay::transform::SimplifyInference | ( | ) |
Simplify certain operators during inference. For example, the result of a batch norm which is indexed at tuple index 0 will be unpacked into a number of simplified operators.
Pass tvm::relay::transform::SplitArgs | ( | uint64_t | max_function_args | ) |
Split function with huge number of arguments to smaller pieces.
max_function_args | Maximum number of function arguments. If it equals 0 then SplitArgs shouldn't split the function. |
Pass tvm::relay::transform::ToANormalForm | ( | ) |
turn a dataflow graph into Administrative Normal Form, or A-Normal Form (ANF).
It will turn an expression that is in a graph form (with sharing implicit), to an expression with explicit sharing (A-Normal Form).
The scope of the root expression is the global scope.
The scope of any non root expression is the least common ancestor of all it's scope.
Values are ordered by post-DFS order in each scope.
ToANormalForm but on incomplete graph.
expr | the graph. |
Pass tvm::relay::transform::ToBasicBlockNormalForm | ( | ) |
Turn an expression to Basic Block Normal Form.
We define a block as a group of expressions implied by the scope structure.
Each graph node can only belong to a single block.
For any value that is being used in multiple blocks, it has to be referred by a Var which is defined in a block, whose scope is the least common ancestor of blocks this value is used.
Pass tvm::relay::transform::ToCPS | ( | ) |
Turn an expression into continuation passing style(CPS).
CPS mean that every function will, instead of returning the result directly, be passed down an extra function (called the continuation) as argument, and pass the result to the continuation instead.
Thus, every function call has to be passed an extra argument that represent the rest of the computation (Hence the name of continuation).
Similarly, all other compute will be wrapped and call the continuation as well.
Pass tvm::relay::transform::ToGraphNormalForm | ( | ) |
Remove let binding and directly share via pointer instead.
It will remove all let binding, and turn all of the variable bound by let into direct pointer reference.