tvm
Classes | Functions
tvm::runtime::profiling Namespace Reference

Classes

struct  DeviceWrapperNode
 Wrapper for Device because Device is not passable across the PackedFunc interface. More...
 
class  DeviceWrapper
 Wrapper for Device. More...
 
class  ReportNode
 Data collected from a profiling run. Includes per-call metrics and per-device metrics. More...
 
class  Report
 
class  MetricCollectorNode
 Interface for user defined profiling metric collection. More...
 
class  MetricCollector
 Wrapper for MetricCollectorNode. More...
 
struct  CallFrame
 
class  Profiler
 
class  DurationNode
 
class  PercentNode
 
class  CountNode
 
class  RatioNode
 

Functions

MetricCollector CreatePAPIMetricCollector (Map< DeviceWrapper, Array< String >> metrics)
 Construct a metric collector that collects data from hardware performance counters using the Performance Application Programming Interface (PAPI). More...
 
String ShapeString (const std::vector< NDArray > &shapes)
 String representation of an array of NDArray shapes. More...
 
String ShapeString (NDArray shape, DLDataType dtype)
 String representation of shape encoded as an NDArray. More...
 
String ShapeString (const std::vector< int64_t > &shape, DLDataType dtype)
 String representation of a shape encoded as a vector. More...
 
PackedFunc ProfileFunction (Module mod, std::string func_name, int device_type, int device_id, int warmup_iters, Array< MetricCollector > collectors)
 Collect performance information of a function execution. Usually used with a compiled PrimFunc (via tvm.build). More...
 
PackedFunc WrapTimeEvaluator (PackedFunc f, Device dev, int number, int repeat, int min_repeat_ms, int limit_zero_time_iterations, int cooldown_interval_ms, int repeats_to_cooldown, int cache_flush_bytes=0, PackedFunc f_preproc=nullptr)
 Wrap a timer function to measure the time cost of a given packed function. More...
 

Function Documentation

◆ CreatePAPIMetricCollector()

MetricCollector tvm::runtime::profiling::CreatePAPIMetricCollector ( Map< DeviceWrapper, Array< String >>  metrics)

Construct a metric collector that collects data from hardware performance counters using the Performance Application Programming Interface (PAPI).

Parameters
metricsA mapping from a device type to the metrics that should be collected on that device. You can find the names of available metrics by running papi_native_avail.

◆ ProfileFunction()

PackedFunc tvm::runtime::profiling::ProfileFunction ( Module  mod,
std::string  func_name,
int  device_type,
int  device_id,
int  warmup_iters,
Array< MetricCollector collectors 
)

Collect performance information of a function execution. Usually used with a compiled PrimFunc (via tvm.build).

This information can include performance counters like cache hits and FLOPs that are useful in debugging performance issues of individual PrimFuncs. Different metrics can be collected depending on which MetricCollector is used.

Example usage:

// Use PAPI to measure the number of floating point operations.
PackedFunc profiler = ProfileModule(
mod, "main", kDLCPU, 0, {CreatePAPIMetricCollector({{kDLCPU, 0}, {"PAPI_FP_OPS"}})});
Report r = profiler(arg1, arg2, arg);
std::cout << r << std::endl;
MetricCollector CreatePAPIMetricCollector(Map< DeviceWrapper, Array< String >> metrics)
Construct a metric collector that collects data from hardware performance counters using the Performa...
tvm::PrimExpr mod(const tvm::PrimExpr &a, const tvm::PrimExpr &b)
Definition: broadcast.h:290
Parameters
modModule to profile. Usually a PrimFunc that has been compiled to machine code.
func_nameName of function to run in the module.
device_typeDevice type to run on. Profiling will include performance metrics specific to this device type.
device_idId of device to run on.
warmup_itersNumber of iterations of the function to run before collecting performance information. Recommend to set this larger than 0 so that cache effects are consistent.
collectorsList of different ways to collect metrics. See MetricCollector.
Returns
A PackedFunc which takes the same arguments as the mod[func_name] and returns performance metrics as a Map<String, ObjectRef> where values can be CountNode, DurationNode, PercentNode.

◆ ShapeString() [1/3]

String tvm::runtime::profiling::ShapeString ( const std::vector< int64_t > &  shape,
DLDataType  dtype 
)

String representation of a shape encoded as a vector.

Parameters
shapeShape as a vector of integers.
dtypeThe dtype of the shape.
Returns
A textual representation of the shape. For example: float32[2].

◆ ShapeString() [2/3]

String tvm::runtime::profiling::ShapeString ( const std::vector< NDArray > &  shapes)

String representation of an array of NDArray shapes.

Parameters
shapesArray of NDArrays to get the shapes of.
Returns
A textual representation of the shapes. For example: float32[2], int64[1, 2].

◆ ShapeString() [3/3]

String tvm::runtime::profiling::ShapeString ( NDArray  shape,
DLDataType  dtype 
)

String representation of shape encoded as an NDArray.

Parameters
shapeNDArray containing the shape.
dtypeThe dtype of the shape.
Returns
A textual representation of the shape. For example: float32[2].

◆ WrapTimeEvaluator()

PackedFunc tvm::runtime::profiling::WrapTimeEvaluator ( PackedFunc  f,
Device  dev,
int  number,
int  repeat,
int  min_repeat_ms,
int  limit_zero_time_iterations,
int  cooldown_interval_ms,
int  repeats_to_cooldown,
int  cache_flush_bytes = 0,
PackedFunc  f_preproc = nullptr 
)

Wrap a timer function to measure the time cost of a given packed function.

Approximate implementation:

f() // warmup
for i in range(repeat)
f_preproc()
while True:
start = time()
for j in range(number):
f()
duration_ms = time() - start
if duration_ms >= min_repeat_ms:
break
else:
number = (min_repeat_ms / (duration_ms / number) + 1
if cooldown_interval_ms and i % repeats_to_cooldown == 0:
sleep(cooldown_interval_ms)
Parameters
fThe function argument.
devThe device.
numberThe number of times to run this function for taking average. We call these runs as one repeat of measurement.
repeatThe number of times to repeat the measurement. In total, the function will be invoked (1 + number x repeat) times, where the first one is warm up and will be discarded. The returned result contains repeat costs, each of which is an average of number costs.
min_repeat_msThe minimum duration of one repeat in milliseconds. By default, one repeat contains number runs. If this parameter is set, the parameters number will be dynamically adjusted to meet the minimum duration requirement of one repeat. i.e., When the run time of one repeat falls below this time, the number parameter will be automatically increased.
limit_zero_time_iterationsThe maximum number of repeats when measured time is equal to 0. It helps to avoid hanging during measurements.
cooldown_interval_msThe cooldown interval in milliseconds between the number of repeats defined by repeats_to_cooldown.
repeats_to_cooldownThe number of repeats before the cooldown is activated.
cache_flush_bytesThe number of bytes to flush from cache before
f_preprocThe function to be executed before we execute time evaluator.
Returns
f_timer A timer function.