tvm
|
Classes | |
struct | DeviceWrapperNode |
Wrapper for Device because Device is not passable across the PackedFunc interface. More... | |
class | DeviceWrapper |
Wrapper for Device . More... | |
class | ReportNode |
Data collected from a profiling run. Includes per-call metrics and per-device metrics. More... | |
class | Report |
class | MetricCollectorNode |
Interface for user defined profiling metric collection. More... | |
class | MetricCollector |
Wrapper for MetricCollectorNode . More... | |
struct | CallFrame |
class | Profiler |
class | DurationNode |
class | PercentNode |
class | CountNode |
class | RatioNode |
Functions | |
MetricCollector | CreatePAPIMetricCollector (Map< DeviceWrapper, Array< String >> metrics) |
Construct a metric collector that collects data from hardware performance counters using the Performance Application Programming Interface (PAPI). More... | |
String | ShapeString (const std::vector< NDArray > &shapes) |
String representation of an array of NDArray shapes. More... | |
String | ShapeString (NDArray shape, DLDataType dtype) |
String representation of shape encoded as an NDArray. More... | |
String | ShapeString (const std::vector< int64_t > &shape, DLDataType dtype) |
String representation of a shape encoded as a vector. More... | |
PackedFunc | ProfileFunction (Module mod, std::string func_name, int device_type, int device_id, int warmup_iters, Array< MetricCollector > collectors) |
Collect performance information of a function execution. Usually used with a compiled PrimFunc (via tvm.build). More... | |
PackedFunc | WrapTimeEvaluator (PackedFunc f, Device dev, int number, int repeat, int min_repeat_ms, int limit_zero_time_iterations, int cooldown_interval_ms, int repeats_to_cooldown, int cache_flush_bytes=0, PackedFunc f_preproc=nullptr) |
Wrap a timer function to measure the time cost of a given packed function. More... | |
MetricCollector tvm::runtime::profiling::CreatePAPIMetricCollector | ( | Map< DeviceWrapper, Array< String >> | metrics | ) |
Construct a metric collector that collects data from hardware performance counters using the Performance Application Programming Interface (PAPI).
metrics | A mapping from a device type to the metrics that should be collected on that device. You can find the names of available metrics by running papi_native_avail . |
PackedFunc tvm::runtime::profiling::ProfileFunction | ( | Module | mod, |
std::string | func_name, | ||
int | device_type, | ||
int | device_id, | ||
int | warmup_iters, | ||
Array< MetricCollector > | collectors | ||
) |
Collect performance information of a function execution. Usually used with a compiled PrimFunc (via tvm.build).
This information can include performance counters like cache hits and FLOPs that are useful in debugging performance issues of individual PrimFuncs. Different metrics can be collected depending on which MetricCollector is used.
Example usage:
mod | Module to profile. Usually a PrimFunc that has been compiled to machine code. |
func_name | Name of function to run in the module. |
device_type | Device type to run on. Profiling will include performance metrics specific to this device type. |
device_id | Id of device to run on. |
warmup_iters | Number of iterations of the function to run before collecting performance information. Recommend to set this larger than 0 so that cache effects are consistent. |
collectors | List of different ways to collect metrics. See MetricCollector. |
mod[func_name]
and returns performance metrics as a Map<String, ObjectRef>
where values can be CountNode
, DurationNode
, PercentNode
. String tvm::runtime::profiling::ShapeString | ( | const std::vector< int64_t > & | shape, |
DLDataType | dtype | ||
) |
String representation of a shape encoded as a vector.
shape | Shape as a vector of integers. |
dtype | The dtype of the shape. |
float32[2]
. PackedFunc tvm::runtime::profiling::WrapTimeEvaluator | ( | PackedFunc | f, |
Device | dev, | ||
int | number, | ||
int | repeat, | ||
int | min_repeat_ms, | ||
int | limit_zero_time_iterations, | ||
int | cooldown_interval_ms, | ||
int | repeats_to_cooldown, | ||
int | cache_flush_bytes = 0 , |
||
PackedFunc | f_preproc = nullptr |
||
) |
Wrap a timer function to measure the time cost of a given packed function.
Approximate implementation:
f | The function argument. |
dev | The device. |
number | The number of times to run this function for taking average. We call these runs as one repeat of measurement. |
repeat | The number of times to repeat the measurement. In total, the function will be invoked (1 + number x repeat) times, where the first one is warm up and will be discarded. The returned result contains repeat costs, each of which is an average of number costs. |
min_repeat_ms | The minimum duration of one repeat in milliseconds. By default, one repeat contains number runs. If this parameter is set, the parameters number will be dynamically adjusted to meet the minimum duration requirement of one repeat . i.e., When the run time of one repeat falls below this time, the number parameter will be automatically increased. |
limit_zero_time_iterations | The maximum number of repeats when measured time is equal to 0. It helps to avoid hanging during measurements. |
cooldown_interval_ms | The cooldown interval in milliseconds between the number of repeats defined by repeats_to_cooldown . |
repeats_to_cooldown | The number of repeats before the cooldown is activated. |
cache_flush_bytes | The number of bytes to flush from cache before |
f_preproc | The function to be executed before we execute time evaluator. |