tvm.autotvm
The auto-tuning module of tvm
This module includes:
Tuning space definition API
Efficient auto-tuners
Tuning result and database support
Distributed measurement to scale up tuning
- tvm.autotvm.apply_history_best(records: None | str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]] | Iterable[str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]]])
Apply the history best config
- Parameters:
records (None, Records, or iterator of Records objects, where a) –
- Records object is a path-like object, a file-like object,
or an iterator of (MeasureInput, MeasureResult).
Collection of tuning records. If multiple Records objects are passed, their contents will be merged.
tvm.autotvm.measure
User facing API for specifying how to measure the generated code
- class tvm.autotvm.measure.MeasureInput(target, task, config)
Stores all the necessary inputs for a measurement.
- Parameters:
target (tvm.target.Target) – The target device
task (task.Task) – Task function
config (ConfigEntity) – Specific configuration.
- class tvm.autotvm.measure.MeasureResult(costs, error_no, all_cost, timestamp)
Stores all the results of a measurement
- Parameters:
costs (Array of float or Array of Exception) – If no error occurs during measurement, it is an array of measured running times. If an error occurs during measurement, it is an array of the exception objections.
error_no (int) – Denote error type, defined by MeasureErrorNo
all_cost (float) – All cost of this measure, including rpc, compilation, test runs
timestamp (float) – The absolute time stamp when we finish measurement.
- tvm.autotvm.measure.measure_option(builder, runner)
Set options for measure. To measure a config, we will build it and run it. So we have to set options for these two steps. They have their own options on timeout, parallel, etc.
- Parameters:
Examples
# example setting for using local devices >>> measure_option = autotvm.measure_option( >>> builder=autotvm.LocalBuilder(), # use all local cpu cores for compilation >>> runner=autotvm.LocalRunner( # measure them sequentially >>> number=10, >>> timeout=5) >>> )
# example setting for using remote devices >>> measure_option = autotvm.measure_option( >>> builder=autotvm.LocalBuilder(), # use all local cpu cores for compilation >>> runner=autotvm.RPCRunner( >>> ‘rasp3b’, ‘locahost’, 9190, # device key, host and port of the rpc tracker >>> number=4, >>> timeout=4) # timeout of a run on the device. RPC request waiting time is excluded. >>>)
Note
To make measurement results accurate, you should pick the correct value for the argument number and repeat in Runner(). Some devices need a certain minimum running time to “warm up,” such as GPUs that need time to reach a performance power state. Using min_repeat_ms can dynamically adjusts number, so it is recommended. The typical value for NVIDIA GPU is 150 ms.
- tvm.autotvm.measure.create_measure_batch(task, option)
Get a standard measure_batch function.
- Parameters:
task (tvm.autotvm.task.Task) – The tuning task
option (dict) – The option for measuring generated code. You should use the return value of function
measure_option
for this argument.
- Returns:
measure_batch – a callback function to measure a batch of configs
- Return type:
callable
- class tvm.autotvm.measure.measure_methods.LocalBuilder(timeout=10, n_parallel=None, build_kwargs=None, build_func='default', do_fork=False, runtime=None)
Run compilation on local machine
- Parameters:
timeout (float) – The timeout of a compilation
n_parallel (int) – The number of tasks run in parallel. “None” will use all cpu cores
build_kwargs (dict) – If supplied, additional kwargs passed to build_func. Overrides any build_kwargs supplied by the Runner.
build_func (callable or str) – If is ‘default’, use default build function If is ‘ndk’, use function for android ndk If id ‘stackvm’, use function for stackvm If is callable, use it as custom build function, expect lib_format field.
do_fork (bool) – If False, do not fork when building. Requires n_parallel=1.
runtime (Optional[Runtime]) – Specify the runtime to generate artifacts for
- class tvm.autotvm.measure.measure_methods.RPCRunner(key, host, port, priority=1, timeout=10, n_parallel=None, number=4, repeat=3, min_repeat_ms=0, cooldown_interval=0.1, enable_cpu_cache_flush=False, module_loader=None)
Run generated code on remove devices. This function will ask a RPC Tracker to get device for measurement.
- Parameters:
timeout (float) – The timeout of a RPCRunner measurement task
n_parallel (int) – The number of tasks run in parallel. “None” will use all cpu cores
key (str) – The key of the device registered in the tracker
host (str) – The host address of RPC Tracker
port (int) – The port of RPC Tracker
number (int) – The number of times to run the generated code for taking average. We call these runs as one repeat of measurement.
repeat (int, optional) – The number of times to repeat the measurement. In total, the generated code will be run (1 + number x repeat) times, where the first “1” is warm up and will be discarded. The returned result contains repeat costs, each of which is an average of number costs.
min_repeat_ms (int, optional) – The minimum duration of one repeat in milliseconds. By default, one repeat contains number runs. If this parameter is set, the parameters number will be dynamically adjusted to meet the minimum duration requirement of one repeat. i.e., When the run time of one repeat falls below this time, the number parameter will be automatically increased.
cooldown_interval (float, optional) – The cool down interval between two measurements.
enable_cpu_cache_flush (bool) – Whether to flush cache on CPU between repeated measurements. Flushing cache can make the measured latency of one operator closer to its actual latency during end-to-end inference. To make this option effective, the argument number should also be set to 1. This is only has effect on CPU task.
module_loader (ModuleLoader) – If given, a context manager that loads the module to be timed into the remote runtime. If not given, default_module_loader is used.
- class tvm.autotvm.measure.measure_methods.LocalRunner(timeout=10, number=4, repeat=3, min_repeat_ms=0, cooldown_interval=0.1, enable_cpu_cache_flush=False, module_loader=None)
Run generated code on local devices.
- Parameters:
timeout (float) – The timeout of a compilation
number (int) – The number of times to run the generated code for taking average. We call these runs as one repeat of measurement.
repeat (int, optional) – The number of times to repeat the measurement. In total, the generated code will be run (1 + number x repeat) times, where the first one is warm up and will be discarded. The returned result contains repeat costs, each of which is an average of number costs.
min_repeat_ms (int, optional) – The minimum duration of one repeat in milliseconds. By default, one repeat contains number runs. If this parameter is set, the parameters number will be dynamically adjusted to meet the minimum duration requirement of one repeat. i.e., When the run time of one repeat falls below this time, the number parameter will be automatically increased.
cooldown_interval (float, optional) – The cool down interval between two measurements.
enable_cpu_cache_flush (bool) – Whether to flush cache on CPU between repeated measurements. Flushing cache can make the measured latency of one operator closer to its actual latency during end-to-end inference. To make this option effective, the argument number should also be set to 1. This is only has effect on CPU task.
Note
This is a “fake” local mode. We start a silent rpc tracker and rpc server for the user. In this way we reuse timeout/isolation mechanism in RPC infrastructure.
tvm.autotvm.tuner
A tuner takes a task as input. It proposes some promising ConfigEntity
in the ConfigSpace
and measure them on the real hardware. Then it
proposed the next batch of ConfigEntity
according to the measure results.
This tuning loop is repeated.
- class tvm.autotvm.tuner.Tuner(task, **kwargs)
Base class for tuners
- Parameters:
task (autotvm.task.Task) – Tuning Task
- next_batch(batch_size)
get the next batch of configs to be measure on real hardware
- Parameters:
batch_size (int) – The size of the batch
- Return type:
a batch of configs
- update(inputs, results)
Update parameters of the tuner according to measurement results
- Parameters:
inputs (Array of autotvm.measure.MeasureInput) – The input for measurement
results (Array of autotvm.measure.MeasureResult) – result for measurement
- tune(n_trial, measure_option, early_stopping=None, callbacks=(), si_prefix='G')
Begin tuning
- Parameters:
n_trial (int) – Maximum number of configs to try (measure on real hardware)
measure_option (dict) – The options for how to measure generated code. You should use the return value ot autotvm.measure_option for this argument.
early_stopping (int, optional) – Early stop the tuning when not finding better configs in this number of trials
callbacks (List of callable) – A list of callback functions. The signature of callback function is (Tuner, List of MeasureInput, List of MeasureResult) with no return value. These callback functions will be called on every measurement pair. See autotvm/tuner/callback.py for some examples.
si_prefix (str) – One of tvm.autotvm.utils.SI_PREFIXES. The SI prefix to use when reporting FLOPS.
- reset()
reset the status of tuner
- load_history(data_set, min_seed_records=500)
load history data for transfer learning
- Parameters:
data_set (Array of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult) pair) – Previous tuning records
min_seed_records (int) – Defaults to 500. Indicates the minimum number of records to train the tuner with. If there are less than min_seed_records number of records in data_set, no training of the tuner will be done.
- set_error_threshold(threshold)
Modify error counter threshold, which controls switch to debug mode
- Parameters:
threshold (New threshold value)
- class tvm.autotvm.tuner.RandomTuner(task, range_idx=None)
Enumerate the search space in a random order
- Parameters:
- load_history(data_set, min_seed_records=500)
load history data for transfer learning
- Parameters:
data_set (Array of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult) pair) – Previous tuning records
min_seed_records (int) – Defaults to 500. Indicates the minimum number of records to train the tuner with. If there are less than min_seed_records number of records in data_set, no training of the tuner will be done.
- next_batch(batch_size)
get the next batch of configs to be measure on real hardware
- Parameters:
batch_size (int) – The size of the batch
- Return type:
a batch of configs
- reset()
reset the status of tuner
- set_error_threshold(threshold)
Modify error counter threshold, which controls switch to debug mode
- Parameters:
threshold (New threshold value)
- tune(n_trial, measure_option, early_stopping=None, callbacks=(), si_prefix='G')
Begin tuning
- Parameters:
n_trial (int) – Maximum number of configs to try (measure on real hardware)
measure_option (dict) – The options for how to measure generated code. You should use the return value ot autotvm.measure_option for this argument.
early_stopping (int, optional) – Early stop the tuning when not finding better configs in this number of trials
callbacks (List of callable) – A list of callback functions. The signature of callback function is (Tuner, List of MeasureInput, List of MeasureResult) with no return value. These callback functions will be called on every measurement pair. See autotvm/tuner/callback.py for some examples.
si_prefix (str) – One of tvm.autotvm.utils.SI_PREFIXES. The SI prefix to use when reporting FLOPS.
- update(inputs, results)
Update parameters of the tuner according to measurement results
- Parameters:
inputs (Array of autotvm.measure.MeasureInput) – The input for measurement
results (Array of autotvm.measure.MeasureResult) – result for measurement
- class tvm.autotvm.tuner.GridSearchTuner(task, range_idx=None)
Enumerate the search space in a grid search order
- next_batch(batch_size)
get the next batch of configs to be measure on real hardware
- Parameters:
batch_size (int) – The size of the batch
- Return type:
a batch of configs
- load_history(data_set, min_seed_records=500)
load history data for transfer learning
- Parameters:
data_set (Array of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult) pair) – Previous tuning records
min_seed_records (int) – Defaults to 500. Indicates the minimum number of records to train the tuner with. If there are less than min_seed_records number of records in data_set, no training of the tuner will be done.
- reset()
reset the status of tuner
- set_error_threshold(threshold)
Modify error counter threshold, which controls switch to debug mode
- Parameters:
threshold (New threshold value)
- tune(n_trial, measure_option, early_stopping=None, callbacks=(), si_prefix='G')
Begin tuning
- Parameters:
n_trial (int) – Maximum number of configs to try (measure on real hardware)
measure_option (dict) – The options for how to measure generated code. You should use the return value ot autotvm.measure_option for this argument.
early_stopping (int, optional) – Early stop the tuning when not finding better configs in this number of trials
callbacks (List of callable) – A list of callback functions. The signature of callback function is (Tuner, List of MeasureInput, List of MeasureResult) with no return value. These callback functions will be called on every measurement pair. See autotvm/tuner/callback.py for some examples.
si_prefix (str) – One of tvm.autotvm.utils.SI_PREFIXES. The SI prefix to use when reporting FLOPS.
- update(inputs, results)
Update parameters of the tuner according to measurement results
- Parameters:
inputs (Array of autotvm.measure.MeasureInput) – The input for measurement
results (Array of autotvm.measure.MeasureResult) – result for measurement
- class tvm.autotvm.tuner.GATuner(task, pop_size=100, elite_num=3, mutation_prob=0.1)
Tuner with genetic algorithm. This tuner does not have a cost model so it always run measurement on real machines. This tuner expands the
ConfigEntity
as gene.- Parameters:
- next_batch(batch_size)
get the next batch of configs to be measure on real hardware
- Parameters:
batch_size (int) – The size of the batch
- Return type:
a batch of configs
- update(inputs, results)
Update parameters of the tuner according to measurement results
- Parameters:
inputs (Array of autotvm.measure.MeasureInput) – The input for measurement
results (Array of autotvm.measure.MeasureResult) – result for measurement
- reset()
reset the status of tuner
- set_error_threshold(threshold)
Modify error counter threshold, which controls switch to debug mode
- Parameters:
threshold (New threshold value)
- tune(n_trial, measure_option, early_stopping=None, callbacks=(), si_prefix='G')
Begin tuning
- Parameters:
n_trial (int) – Maximum number of configs to try (measure on real hardware)
measure_option (dict) – The options for how to measure generated code. You should use the return value ot autotvm.measure_option for this argument.
early_stopping (int, optional) – Early stop the tuning when not finding better configs in this number of trials
callbacks (List of callable) – A list of callback functions. The signature of callback function is (Tuner, List of MeasureInput, List of MeasureResult) with no return value. These callback functions will be called on every measurement pair. See autotvm/tuner/callback.py for some examples.
si_prefix (str) – One of tvm.autotvm.utils.SI_PREFIXES. The SI prefix to use when reporting FLOPS.
- load_history(data_set, min_seed_records=500)
load history data for transfer learning
- Parameters:
data_set (Array of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult) pair) – Previous tuning records
min_seed_records (int) – Defaults to 500. Indicates the minimum number of records to train the tuner with. If there are less than min_seed_records number of records in data_set, no training of the tuner will be done.
- class tvm.autotvm.tuner.XGBTuner(task, plan_size=64, feature_type='itervar', loss_type='reg', num_threads=None, optimizer='sa', diversity_filter_ratio=None, log_interval=50)
Tuner that uses xgboost as cost model
- Parameters:
task (Task) – The tuning task
plan_size (int) – The size of a plan. After plan_size trials, the tuner will refit a new cost model and do planing for the next plan_size trials.
feature_type (str, optional) –
If is ‘itervar’, use features extracted from IterVar (loop variable). If is ‘knob’, use flatten ConfigEntity directly. If is ‘curve’, use sampled curve feature (relation feature).
Note on choosing feature type: For single task tuning, ‘itervar’ and ‘knob’ are good. ‘itervar’ is more accurate but ‘knob’ is much faster. There are some constraints on ‘itervar’, if you meet problems with feature extraction when using ‘itervar’, you can switch to ‘knob’.
For cross-shape tuning (e.g. many convolutions with different shapes), ‘itervar’ and ‘curve’ has better transferability, ‘knob’ is faster.
For cross-device or cross-operator tuning, you can use ‘curve’ only.
loss_type (str) – If is ‘reg’, use regression loss to train cost model. The cost model predicts the normalized flops. If is ‘rank’, use pairwise rank loss to train cost model. The cost model predicts relative rank score. If is ‘rank-binary’, use pairwise rank loss with binarized labels to train cost model. The cost model predicts relative rank score.
num_threads (int, optional) – The number of threads.
optimizer (str or ModelOptimizer, optional) – If is ‘sa’, use a default simulated annealing optimizer. Otherwise it should be a ModelOptimizer object.
diversity_filter_ratio (int or float, optional) – If is not None, the tuner will first select top-(plan_size * diversity_filter_ratio) candidates according to the cost model and then pick batch_size of them according to the diversity metric.
log_interval (int = 50) – The verbose level. If is 0, output nothing. Otherwise, output debug information every verbose iterations.
- tune(*args, **kwargs)
Begin tuning
- Parameters:
n_trial (int) – Maximum number of configs to try (measure on real hardware)
measure_option (dict) – The options for how to measure generated code. You should use the return value ot autotvm.measure_option for this argument.
early_stopping (int, optional) – Early stop the tuning when not finding better configs in this number of trials
callbacks (List of callable) – A list of callback functions. The signature of callback function is (Tuner, List of MeasureInput, List of MeasureResult) with no return value. These callback functions will be called on every measurement pair. See autotvm/tuner/callback.py for some examples.
si_prefix (str) – One of tvm.autotvm.utils.SI_PREFIXES. The SI prefix to use when reporting FLOPS.
- load_history(data_set, min_seed_records=500)
load history data for transfer learning
- Parameters:
data_set (Array of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult) pair) – Previous tuning records
min_seed_records (int) – Defaults to 500. Indicates the minimum number of records to train the tuner with. If there are less than min_seed_records number of records in data_set, no training of the tuner will be done.
- next_batch(batch_size)
get the next batch of configs to be measure on real hardware
- Parameters:
batch_size (int) – The size of the batch
- Return type:
a batch of configs
- reset()
reset the status of tuner
- set_error_threshold(threshold)
Modify error counter threshold, which controls switch to debug mode
- Parameters:
threshold (New threshold value)
- update(inputs, results)
Update parameters of the tuner according to measurement results
- Parameters:
inputs (Array of autotvm.measure.MeasureInput) – The input for measurement
results (Array of autotvm.measure.MeasureResult) – result for measurement
Namespace of callback utilities of AutoTVM
- tvm.autotvm.tuner.callback.log_to_file(file_out, protocol='json')
Log the tuning records into file. The rows of the log are stored in the format of autotvm.record.encode.
- tvm.autotvm.tuner.callback.log_to_database(db)
Save the tuning records to a database object.
- Parameters:
db (Database) – The database
tvm.autotvm.task
Task is a tunable composition of template functions.
Tuner takes a tunable task and optimizes the joint configuration space of all the template functions in the task. This module defines the task data structure, as well as a collection(zoo) of typical tasks of interest.
Definition of task function.
Task can be constructed from tuple of func, args, and kwargs. func is a state-less function, or a string that registers the standard task.
- tvm.autotvm.task.task.serialize_args(args)
serialize arguments of a topi function to a hashable tuple.
- tvm.autotvm.task.task.deserialize_args(args)
The inverse function of
serialize_args
.
- tvm.autotvm.task.task.args_to_workload(args, task_name=None)
Convert argument list to hashable workload tuple. This function will convert list to tuple, tvm node to python value and flatten te.tensor.Tensor to a tuple
- class tvm.autotvm.task.task.Task(name, args)
A Tunable Task
- instantiate(config)
Instantiate this task function (template) with a config. Returns corresponding schedule.
- Parameters:
config (template.ConfigEntity) – parameter config for this template
- Returns:
sch (tvm.te.schedule.Schedule) – The tvm schedule
arg_bufs (Array of te.tensor.Tensor) – The input/output buffers
- class tvm.autotvm.task.task.TaskTemplate
Task template is used to creates a tunable AutoTVM task.
It can be defined by a pair of compute and schedule function using _register_task_compute and _register_task_schedule, or by a customized task creation function that is more flexible using _register_customized_task.
Note that when customized func is registered, compute and schedule function will be ignored
- class tvm.autotvm.task.task.MissingTask(taskname: str)
Dummy task template for a task lookup which cannot be resolved. This can occur if the task being requested from _lookup_task() has not been imported in this run.
- tvm.autotvm.task.task.template(task_name, func=None)
Decorate a function as a tunable schedule template.
- Parameters:
task_name (str) – The task name
func (None or callable) – A callable template function. If it is None, return a decorator. If is callable, decorate this function.
- Returns:
func – The decorated function
- Return type:
callable
Examples
The following code is a tunable template for a blocked matrix multiplication
@autotvm.template("matmul") def matmul(N, L, M, dtype): A = te.placeholder((N, L), name='A', dtype=dtype) B = te.placeholder((L, M), name='B', dtype=dtype) k = te.reduce_axis((0, L), name='k') C = te.compute((N, M), lambda i, j: te.sum(A[i, k] * B[k, j], axis=k), name='C') s = te.create_schedule(C.op) # schedule y, x = s[C].op.axis k = s[C].op.reduce_axis[0] ##### define space begin ##### cfg = autotvm.get_config() cfg.define_split("tile_y", y, num_outputs=2) cfg.define_split("tile_x", x, num_outputs=2) ##### define space end ##### # schedule according to config yo, yi = cfg["tile_y"].apply(s, C, y) xo, xi = cfg["tile_x"].apply(s, C, x) s[C].reorder(yo, xo, k, yi, xi) return s, [A, B, C]
- tvm.autotvm.task.task.create(task_name, args, target, target_host=None)
Create a tuning task and initialize its search space
- tvm.autotvm.task.task.get_config()
Get current config object
- Returns:
cfg – The current config
- Return type:
- exception tvm.autotvm.task.task.FlopCalculationError
Error happens when estimating FLOP for a compute op
- tvm.autotvm.task.task.compute_flop(sch)
Calculate number of FLOP (floating number operations) of the compute ops in a schedule
- Parameters:
sch (tvm.te.schedule.Schedule) – schedule
- Returns:
flop – number of FLOP in this schedule
- Return type:
Template configuration space.
Each template function can be parameterized by a ConfigSpace. The space is declared when we invoke the template function with ConfigSpace. During evaluation, we pass in a ConfigEntity, which contains a specific entity in the space. This entity contains deterministic parameters.
- class tvm.autotvm.task.space.Axis(space, index)
- index
Alias for field number 1
- space
Alias for field number 0
- exception tvm.autotvm.task.space.InstantiationError
Actively detected error in instantiating a template with a config, raised by cfg.raise_error e.g. too many unrolling, too many threads in a block
- class tvm.autotvm.task.space.TransformSpace
Base class for transform space TransformSpace is the node in the computation graph of axes
Note
We can regard our schedule code as a transformation graph of axes. Starting from raw axes in the definition of te.compute, we can transform these axes by some operators. The operator includes ‘split’, ‘reorder’ and ‘annotate’. Each operator has some tunable parameters (e.g. the split factor). Then the tuning process is just to find good parameters of these op.
So all the combinations of the parameters of these op form our search space.
Naming convention: We call the set of all possible values as XXXSpace. (XXX can be Split, Reorder, Config …) We call a specific entity in a space as XXXEntity.
- class tvm.autotvm.task.space.VirtualAxis(var, name=None)
Axis placeholder in template
- Parameters:
- tvm.autotvm.task.space.get_factors(n)
return all factors of an integer
- tvm.autotvm.task.space.get_pow2s(n)
return all power-of-two numbers that are less or equal than the integer
- class tvm.autotvm.task.space.SplitSpace(axes, policy, **kwargs)
Split an axis for several times
- class tvm.autotvm.task.space.SplitEntity(size)
A split operation with detailed parameters that can apply to an axis
- Parameters:
size (Array of int) – the size of every axis after split. e.g. an axis of extent 128, we split it into 3 axes, a possible size is [4, 4, 8] (4x4x8 = 128).
- apply(sch, op, axis)
Apply split to an axis
- Parameters:
sch (tvm.te.schedule.Schedule) – The tvm schedule
op (tvm.te.Operation) – The stage to be applied
axis (tvm.te.schedule.IterVar) – axis to split
- Returns:
axes – The transformed axes.
- Return type:
- class tvm.autotvm.task.space.ReorderSpace(axes, policy, **kwargs)
The parameter space for ordering an array of axes
- class tvm.autotvm.task.space.ReorderEntity(perm)
A reorder operation with detailed parameters that can apply to axes
- apply(sch, op, axes)
Apply reorder to an array of axes
- Parameters:
sch (tvm.te.schedule.Schedule) – The tvm schedule
op (tvm.te.Operation) – The stage to be applied
axis (tvm.te.schedule.IterVar) – axis to split
- Returns:
axes – The transformed axes.
- Return type:
- class tvm.autotvm.task.space.AnnotateSpace(axes, policy, **kwargs)
The parameter space for annotating an array of axes
- class tvm.autotvm.task.space.AnnotateEntity(anns)
An annotation operation with detailed parameters that can apply to axes
- Parameters:
anns (Array of string) – The annotations of axes
- apply(sch, op, axes, axis_lens=None, max_unroll=None, vec_size=None, cfg=None, source=None)
Apply annotation to an array of axes
- Parameters:
sch (tvm.te.schedule.Schedule) – The tvm schedule
op (tvm.te.Operation) – The stage to be applied
axes (Array of tvm.te.schedule.IterVar) – axis to split
max_unroll (int, optional) – maximum unroll step
vec_size (Array of int, optional) – valid vector lanes for vectorization
cfg (ConfigEntity, optional) – cfg for recording error
source (Array of Array tensor, optional) – source tensor for attaching cache
- Returns:
axes – The transformed axes
- Return type:
list of tvm.te.schedule.IterVar
- class tvm.autotvm.task.space.OtherOptionSpace(axes, policy, **kwargs)
The parameter space for general option
- class tvm.autotvm.task.space.OtherOptionEntity(val)
The parameter entity for general option, with a detailed value
- class tvm.autotvm.task.space.ConfigSpace
The configuration space of a schedule. Pass it as config in template to collect transformation space and build transform graph of axes
- static axis(var)
get a virtual axis (axis placeholder)
- Parameters:
var (int or tvm.te.schedule.IterVar) – If is int, return an axis whose length is the provided argument. If is IterVar, return an axis whose length is extracted from the IterVar’s extent domain.
- static reduce_axis(var)
get a virtual axis (axis placeholder)
- Parameters:
var (int or tvm.te.schedule.IterVar) – If is int, return an axis whose length is the provided argument. If is IterVar, return an axis whose length is extracted from the IterVar’s extent domain.
- define_split(name, axis, policy='factors', **kwargs)
Define a new tunable knob which splits an axis into a list of axes
- Parameters:
name (str) – name to index the entity of this space
axis (tvm.te.schedule.IterVar) – axis to split
policy (str) – name of policy. If is ‘factors’, the tuner will try all divisible factors. If is ‘power2’, the tuner will try power-of-two factors less or equal to the length. If is ‘verbose’, the tuner will try all candidates in above two policies. If is ‘candidate’, try given candidates.
**kwargs –
extra arguments for policy
max_factor
:the maximum split factor (int).
filter
:see examples below for how to use filter (Callable[[int], bool]).
num_outputs
:the total number of axis after split (int).
no_tail
:should we only include divisible numbers as split factors (bool).
candidate
:(policy=candidate) manual candidate list (List).
Examples
>>> # use custom candidates >>> cfg.define_split('tile_x', x, policy='candidate', num_outputs=3, >>> candidate=[[1, 4, 4], [4, 1, 4]])
>>> # use a filter that only accepts the split scheme whose inner most tile is less then 4 >>> cfg.define_split('tile_y', y, policy='factors', num_outputs=3, >>> filter=lambda x: x.size[-1] <= 4)
- define_reorder(name, axes, policy, **kwargs)
Define a new tunable knob which reorders a list of axes
- Parameters:
name (str) – name to index the entity of this space
axes (Array of tvm.te.schedule.IterVar) – axes to reorder
policy (str) – name of policy If is ‘identity’, do an identity permutation. If is ‘all’, try all permutations. If is ‘interval_all’, try all permutations of an interval of axes. If is ‘candidate’, try listed candidate. If is ‘interleave’, interleave chains of spatial axes and chains of reduction axes.
kwargs (dict) – extra arguments for policy
- define_annotate(name, axes, policy, **kwargs)
Define a new tunable knob which annotates a list of axes
- Parameters:
name (str) – name to index the entity of this space
axes (Array of tvm.te.schedule.IterVar) – axes to annotate
policy (str) – name of policy If is ‘unroll’, unroll the axes. If is ‘try_unroll’, try to unroll the axes. If is ‘try_unroll_vec’, try to unroll or vectorize the axes. If is ‘bind_gpu’, bind the first few axes to gpu threads. If is ‘locate_cache’, choose n axes to attach shared/local cache.
kwargs (dict) – extra arguments for policy
- define_knob(name, candidate)
Define a tunable knob with a list of candidates
- add_flop(flop)
Add float operation statistics for this tuning task
- raise_error(msg)
register error in config Using this to actively detect error when scheduling. Otherwise these error will occur during runtime, which will cost more time.
- Parameters:
msg (str)
- valid()
Check whether the config meets all the constraints
Note
This check should be called after instantiation of task, because the ConfigEntity/ConfigSpace collects errors during instantiation
- Returns:
valid – whether the config meets all the constraints
- Return type:
- is_index_valid(index)
Checks if the index satisfies the multi_filter condition
- multi_filter(filter)
The filter can restrict combination of parameters in difference to the knob filter, that restricts only single parameter
- Parameters:
filter (function) – predicate with one argument (Callable[[int], bool])
note:: (..) – Using this filter causes additional restrictions on the use of __len__. Normally, it define the count of valid indexes and the range of space, but when multi_filter enabled, it requires to use __len__ for getting the count of valid indexes or range_length for the range of space. It is recommended to use:
is_index_valid
,get_next_index
,get_rand_index
to bypass the space
Examples
>>> # Pre-requisites >>> candidates = [[16, 64], [32, 32], [64, 16]] >>> filter = lambda v: v.size[0] != 16 >>> multi_filter = lambda e: (e["tile_x"].size[0] + e["tile_y"].size[0]) <= 64
>>> # Case 1 - without filtering >>> cfg.define_split("tile_x", x, num_outputs=2, policy="candidate", candidate=candidates) >>> cfg.define_split("tile_y", y, num_outputs=2, policy="candidate", candidate=candidates) >>> # [('tile_x', [16, 64]), ('tile_y', [16, 64])],None,0 >>> # [('tile_x', [32, 32]), ('tile_y', [16, 64])],None,1 >>> # [('tile_x', [64, 16]), ('tile_y', [16, 64])],None,2 >>> # [('tile_x', [16, 64]), ('tile_y', [32, 32])],None,3 >>> # [('tile_x', [32, 32]), ('tile_y', [32, 32])],None,4 >>> # [('tile_x', [64, 16]), ('tile_y', [32, 32])],None,5 >>> # [('tile_x', [16, 64]), ('tile_y', [64, 16])],None,6 >>> # [('tile_x', [32, 32]), ('tile_y', [64, 16])],None,7 >>> # [('tile_x', [64, 16]), ('tile_y', [64, 16])],None,8
>>> # Case 2 - with filter >>> cfg.define_split("tile_x", x, num_outputs=2, policy="candidate", candidate=candidates, >>> filter=filter) >>> cfg.define_split("tile_y", y, num_outputs=2, policy="candidate", candidate=candidates, >>> filter=filter) >>> # [('tile_x', [32, 32]), ('tile_y', [32, 32])],None,0 >>> # [('tile_x', [64, 16]), ('tile_y', [32, 32])],None,1 >>> # [('tile_x', [32, 32]), ('tile_y', [64, 16])],None,2 >>> # [('tile_x', [64, 16]), ('tile_y', [64, 16])],None,3
>>> # Case 3 - with filter and multi_filter >>> cfg.define_split("tile_x", x, num_outputs=2, policy="candidate", candidate=candidates, >>> filter=filter) >>> cfg.define_split("tile_y", y, num_outputs=2, policy="candidate", candidate=candidates, >>> filter=filter) >>> cfg.multi_filter(filter=multi_filter) >>> # [('tile_x', [32, 32]), ('tile_y', [32, 32])],None,0
- property range_length
Length of the index range in the space
- property dims
Dimensions in the space
- subrange_length(start, end)
Returns the number of valid indexes within the limited range from [start, end]
- get_rand_index(start=None, end=None, to_exclude=None)
Returns a random valid index unlisted to exclusion
- Parameters:
- Returns:
rand (int) – random index in the space
.. note:: – Excluding all valid space indexes will lead to an infinite loop.
- get_next_index(index, n=1, start=None, end=None)
Returns the nth valid next index or None if out of range
- Parameters:
- Returns:
next – next index in the space
- Return type:
- clear_cache()
Clears the cache of index validity
- point2knob(point)
Convert point form (single integer) to knob (vector)
- knob2point(knob)
Convert knob form (vector) to point form (single integer)
- sample_ints(m)
Sample m different integer numbers from [0, self.range_length) without replacement This function is an alternative of np.random.choice when self.range_length > 2 ^ 32, in which case numpy does not work.
- Parameters:
m (int) – The number of sampled int
- Returns:
ints
- Return type:
an numpy array of size m
- random_walk(point)
random walk as local transition
- class tvm.autotvm.task.space.ConfigEntity(index, code_hash, entity_map, constraints)
A configuration with detailed parameters
- Parameters:
- get_flatten_feature()
flatten entities to a numerical one-dimensional feature vector
- Returns:
fea – one dimensional float32 array
- Return type:
np.array
- get_other_option()
- Returns:
other_option – other tunable parameters (tunable parameters defined by cfg.define_knob)
- Return type:
- to_json_dict()
convert to a json serializable dictionary
- Returns:
json_dict – a json serializable dictionary
- Return type:
- static from_json_dict(json_dict)
Build a ConfigEntity from json serializable dictionary
- Parameters:
json_dict (dict) – Json serializable dictionary. This should be the return value of
to_json_dict
.- Returns:
config – The corresponding config object
- Return type:
- class tvm.autotvm.task.space.FallbackConfigEntity
The config entity created to support fallback
- fallback_split(name, constraints)
Fallback a split knob
- Parameters:
Examples
If you use cfg.define_split(‘tile_0’, 128, num_outputs=3), Then cfg.fallback_split(‘tile_0’, [-1, 8, 4]) will give you cfg[‘tile_0’].size = [4, 8, 4]
If you use cfg.define_split(‘tile_0’, 49, num_outputs=3), Then cfg.fallback_split(‘tile_0’, [-1, 8, 4]) will give you cfg[‘tile_0’].size = [7, 7, 1]
- fallback_with_reference_log(ref_log)
A data driven fallback mechanism. We use tuned parameters from TopHub as reference data. For an unseen shape, we find the most similar tuned one from TopHub and mimic its parameters. Note that we are not matching by workload (e.g., input size, kernel size), but instead matching by configuration space. The idea is that if two workloads have similar configuration space, their optimal configurations are also likely to be similar.
- Parameters:
ref_log (List of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult)) – The reference log
Template dispatcher module.
A dispatcher is a function that can contains multiple behaviors. Its specific behavior is can be controlled by DispatchContext.
DispatchContext is used in two ways, usually via different implementation of the DispatchContext base class.
During search, we can use it to pass the current proposal from tuner.
During evaluation, we can use it to set pick the best policy.
- class tvm.autotvm.task.dispatcher.DispatchContext
Base class of dispatch context.
DispatchContext enables the target and workload specific dispatch mechanism for templates.
- query(target, workload)
Query the context to get the specific config for a template. If cannot find the result inside this context, this function will query it from the upper contexts.
- Parameters:
- Returns:
cfg – The specific configuration.
- Return type:
- update(target, workload, cfg)
Update context with a specific config.
- Parameters:
target (Target) – The current target
workload (Workload) – The current workload.
cfg (ConfigSpace) – The specific configuration.
Note
This interface is for cases when TVM decides to replace an operator in the graph. For example, AlterOpLayout pass (enables when opt_level = 3) replaces NCHW convolution with NCHW[x]c implementation on x86 CPUs. Thus in TOPI, we first query schedule using original NCHW workload, then update the dispatcher with the new NCHW[x]c workload. So that later on, NCHW[x]c convolution can get schedule from the dispatcher using its own workload directly.
@conv2d_alter_layout.register("cpu") def _alter_conv2d_layout(attrs, inputs, tinfo): workload = get_conv2d_workload(...) dispatch_ctx = autotvm.task.DispatchContext.current target = tvm.target.Target.current() config = dispatch_ctx.query(target, workload) # Get conv2d_NCHWc workload from config # new_workload = ... # new_inputs = ... # new_attrs = ... # Store altered operator's config dispatch_ctx.update(target, new_workload, config) return sym.contrib.conv2d_NCHWc(*new_inputs, **new_attrs)
We directly store config back because conv2d_NCHW and conv2d_NCHWc share the same schedule parameters. One can construct a new ConfigEntity if this is not the case.
- class tvm.autotvm.task.dispatcher.ApplyConfig(config)
Apply a deterministic config entity for all queries.
- Parameters:
config (ConfigSpace or ConfigEntity) – The specific configuration we care about.
- update(target, workload, cfg)
Override update
- class tvm.autotvm.task.dispatcher.ApplyFixedConfig(tasks, schedule_names: str | List[str])
Apply a config of a deterministic schedule. This is used for building a single Relay operator with deterministic schedule for testing schedules at Relay level.
- Parameters:
tasks (list[tvm.autotvm.task.task.Task]) – List of autoTVM tasks.
- update(target, workload, cfg)
Override update
- class tvm.autotvm.task.dispatcher.ApplyHistoryBest(records: None | str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]] | Iterable[str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]]])
Apply the history best config
- Parameters:
records (None, Records, or iterator of Records objects, where a) –
- Records object is a path-like object, a file-like object,
or an iterator of (MeasureInput, MeasureResult).
Collection of tuning records. If multiple Records objects are passed, their contents will be merged.
- load(records: str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]] | Iterable[str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]]])
Load records to this dispatch context
- Parameters:
records (str, list of str, or iterator of (autotvm.measure.MeasureInput, autotvm.measure.MeasureResult)) – Collection of tuning records. If multiple Records objects are passed, their contents will be merged.
- update(target, workload, cfg)
Update context with a specific config.
- Parameters:
target (Target) – The current target
workload (Workload) – The current workload.
cfg (ConfigSpace) – The specific configuration.
Note
This interface is for cases when TVM decides to replace an operator in the graph. For example, AlterOpLayout pass (enables when opt_level = 3) replaces NCHW convolution with NCHW[x]c implementation on x86 CPUs. Thus in TOPI, we first query schedule using original NCHW workload, then update the dispatcher with the new NCHW[x]c workload. So that later on, NCHW[x]c convolution can get schedule from the dispatcher using its own workload directly.
@conv2d_alter_layout.register("cpu") def _alter_conv2d_layout(attrs, inputs, tinfo): workload = get_conv2d_workload(...) dispatch_ctx = autotvm.task.DispatchContext.current target = tvm.target.Target.current() config = dispatch_ctx.query(target, workload) # Get conv2d_NCHWc workload from config # new_workload = ... # new_inputs = ... # new_attrs = ... # Store altered operator's config dispatch_ctx.update(target, new_workload, config) return sym.contrib.conv2d_NCHWc(*new_inputs, **new_attrs)
We directly store config back because conv2d_NCHW and conv2d_NCHWc share the same schedule parameters. One can construct a new ConfigEntity if this is not the case.
- class tvm.autotvm.task.dispatcher.FallbackContext
A fallback dispatch context.
Any tunable template can be called under this context. This is the root context.
- clear_cache(target, workload)
Clear fallback cache. Pass the same argument as _query_inside to this function to clean the cache.
- update(target, workload, cfg)
Update context with a specific config.
- Parameters:
target (Target) – The current target
workload (Workload) – The current workload.
cfg (ConfigSpace) – The specific configuration.
Note
This interface is for cases when TVM decides to replace an operator in the graph. For example, AlterOpLayout pass (enables when opt_level = 3) replaces NCHW convolution with NCHW[x]c implementation on x86 CPUs. Thus in TOPI, we first query schedule using original NCHW workload, then update the dispatcher with the new NCHW[x]c workload. So that later on, NCHW[x]c convolution can get schedule from the dispatcher using its own workload directly.
@conv2d_alter_layout.register("cpu") def _alter_conv2d_layout(attrs, inputs, tinfo): workload = get_conv2d_workload(...) dispatch_ctx = autotvm.task.DispatchContext.current target = tvm.target.Target.current() config = dispatch_ctx.query(target, workload) # Get conv2d_NCHWc workload from config # new_workload = ... # new_inputs = ... # new_attrs = ... # Store altered operator's config dispatch_ctx.update(target, new_workload, config) return sym.contrib.conv2d_NCHWc(*new_inputs, **new_attrs)
We directly store config back because conv2d_NCHW and conv2d_NCHWc share the same schedule parameters. One can construct a new ConfigEntity if this is not the case.
- tvm.autotvm.task.dispatcher.clear_fallback_cache(target, workload)
Clear fallback cache. Pass the same argument as _query_inside to this function to clean the cache.
Note
This is used in alter_op_layout to clear the bad cache created before call topi compute function
- class tvm.autotvm.task.dispatcher.ApplyGraphBest(records: str | bytes | Path | TextIOBase | Iterable[Tuple[MeasureInput, MeasureResult]])
Load the graph level tuning optimal schedules.
The input records should be in the ascending order of node index for target operator. Usually this can be obtained with graph tuner.
This context maintains an internal counter to indicate the current node index.
- update(target, workload, cfg)
Update context with a specific config.
- Parameters:
target (Target) – The current target
workload (Workload) – The current workload.
cfg (ConfigSpace) – The specific configuration.
Note
This interface is for cases when TVM decides to replace an operator in the graph. For example, AlterOpLayout pass (enables when opt_level = 3) replaces NCHW convolution with NCHW[x]c implementation on x86 CPUs. Thus in TOPI, we first query schedule using original NCHW workload, then update the dispatcher with the new NCHW[x]c workload. So that later on, NCHW[x]c convolution can get schedule from the dispatcher using its own workload directly.
@conv2d_alter_layout.register("cpu") def _alter_conv2d_layout(attrs, inputs, tinfo): workload = get_conv2d_workload(...) dispatch_ctx = autotvm.task.DispatchContext.current target = tvm.target.Target.current() config = dispatch_ctx.query(target, workload) # Get conv2d_NCHWc workload from config # new_workload = ... # new_inputs = ... # new_attrs = ... # Store altered operator's config dispatch_ctx.update(target, new_workload, config) return sym.contrib.conv2d_NCHWc(*new_inputs, **new_attrs)
We directly store config back because conv2d_NCHW and conv2d_NCHWc share the same schedule parameters. One can construct a new ConfigEntity if this is not the case.
Decorators for registering tunable templates to TOPI.
These decorators can make your simple implementation be able to use different configurations for different workloads. Here we directly use all arguments to the TOPI call as “workload”, so make sure all the arguments (except tvm.te.Tensor) in you calls are hashable. For tvm.te.Tensor, we will serialize it to a hashable tuple.
See tvm/topi/python/topi/arm_cpu/depthwise_conv2d.py for example usage.
- class tvm.autotvm.task.topi_integration.TaskExtractEnv(allow_duplicate=False)
Global environment for extracting tuning tasks from graph
- reset(wanted_relay_ops=None)
Reset task collections
- Parameters:
wanted_relay_ops (List of tvm.ir.Op) – The relay ops to be extracted
- add_task(task_name, args)
Add AutoTVM task
- get_tasks()
Get collected tasks
- Returns:
tasks – A list of tasks extracted from the graph
- Return type:
List of tuple(name, args)
- static get(allow_duplicate=False)
Get the single instance of TaskExtractEnv
- Parameters:
allow_duplicate (boolean) – Whether to fetch all workloads in the network, even though some of them are the same. This is useful for graph tuning.
- Returns:
env – The single instance of TaskExtractEnv
- Return type:
- tvm.autotvm.task.topi_integration.register_topi_compute(task_name, func=None)
Register a tunable template for a topi compute function.
The registration will wrap this topi compute to take cfg as the first argument, followed by the original argument list. It uses all its argument as workload and stores this “workload” to its final ComputeOp, which can be used to reconstruct “workload” in the following topi_schedule call.
- Parameters:
task_name (str) – The AutoTVM task name
func (None or callable) – If it is None, return a decorator. If is callable, decorate this function.
- Returns:
decorator – A decorator
- Return type:
callable
Examples
See tvm/topi/python/topi/arm_cpu/depthwise_conv2d.py for example usage.
- tvm.autotvm.task.topi_integration.register_topi_schedule(task_name, func=None)
Register a tunable template for a topi schedule function.
The registration will wrap this topi schedule to take cfg as the first argument, followed by the original argument list.
Note that this function will try to find “workload” from all the ComputeOp in the input. You can attach “workload” to your compute op by using
register_topi_compute
.The task name has to be the same as that of the corresponding topi compute function.
- Parameters:
task_name (str) – The AutoTVM task name
func (None or callable) – If it is None, return a decorator. If is callable, decorate this function.
- Returns:
decorator – A decorator
- Return type:
callable
Examples
See tvm/topi/python/topi/arm_cpu/depthwise_conv2d.py for example usage.
- tvm.autotvm.task.topi_integration.get_workload(outs, task_name=None)
Retrieve the workload from outputs
tvm.autotvm.record
Tuning record and serialization format
- tvm.autotvm.record.measure_str_key(inp, include_config=True)
get unique str key for MeasureInput
- Parameters:
inp (autotvm.measure.MeasureInput) – input for the measure
include_config (bool, optional) – whether includes config in the str key
- Returns:
key – The str representation of key
- Return type:
- tvm.autotvm.record.encode(inp, result, protocol='json')
encode (MeasureInput, MeasureResult) pair to a string
- Parameters:
result (autotvm.measure.MeasureResult) – pair of input/result
protocol (str) – log protocol, json or pickle
- Returns:
row – a row in the logger file
- Return type:
- tvm.autotvm.record.decode(row, protocol='json')
Decode encoded record string to python object
- Parameters:
- Returns:
ret – The tuple of input and result, or None if input uses old version log format.
- Return type:
tuple(autotvm.measure.MeasureInput, autotvm.measure.MeasureResult), or None
- tvm.autotvm.record.load_from_buffer(file: TextIOBase)
Generator: load records from buffer. This is a generator that yields the records.
- Parameters:
file (io.TextIOBase)
- Yields:
input (autotvm.measure.MeasureInput)
result (autotvm.measure.MeasureResult)
- tvm.autotvm.record.load_from_file(filepath: str | bytes | PathLike)
Generator: load records from path. This is a generator that yields the records.
- Parameters:
filepath (str, bytes, or os.PathLike)
- Yields:
input (autotvm.measure.MeasureInput)
result (autotvm.measure.MeasureResult)
- tvm.autotvm.record.split_workload(in_file, clean=True)
Split a log file into separate files, each of which contains only a single workload This function can also delete duplicated records in log file
- tvm.autotvm.record.pick_best(in_file, out_file)
Pick the best entries from a file and store them to another file. This function distills the useful log entries from a large log file. If out_file already exists, the best entries from both in_file and out_file will be saved.