tvm.contrib

Contrib APIs of TVM python package.

Contrib API provides many useful not core features. Some of these are useful utilities to interact with thirdparty libraries and tools.

tvm.contrib.cblas

External function interface to BLAS libraries.

tvm.contrib.cblas.matmul(lhs, rhs, transa=False, transb=False, **kwargs)

Create an extern op that compute matrix mult of A and rhs with CrhsLAS This function serves as an example on how to call external libraries.

Parameters:
  • lhs (Tensor) – The left matrix operand

  • rhs (Tensor) – The right matrix operand

  • transa (bool) – Whether transpose lhs

  • transb (bool) – Whether transpose rhs

Returns:

C – The result tensor.

Return type:

Tensor

tvm.contrib.cblas.batch_matmul(lhs, rhs, transa=False, transb=False, iterative=False, **kwargs)

Create an extern op that compute batched matrix mult of A and rhs with CBLAS This function serves as an example on how to call external libraries.

Parameters:
  • lhs (Tensor) – The left matrix operand

  • rhs (Tensor) – The right matrix operand

  • transa (bool) – Whether transpose lhs

  • transb (bool) – Whether transpose rhs

Returns:

C – The result tensor.

Return type:

Tensor

tvm.contrib.clang

Util to invoke clang in the system.

tvm.contrib.clang.find_clang(required=True)

Find clang in system.

Parameters:

required (bool) – Whether it is required, runtime error will be raised if the compiler is required.

Returns:

valid_list – List of possible paths.

Return type:

list of str

Note

This function will first search clang that matches the major llvm version that built with tvm

tvm.contrib.clang.create_llvm(inputs, output=None, options=None, cc=None)

Create llvm text ir.

Parameters:
  • inputs (list of str) – List of input files name or code source.

  • output (str, optional) – Output file, if it is none a temporary file is created

  • options (list) – The list of additional options string.

  • cc (str, optional) – The clang compiler, if not specified, we will try to guess the matched clang version.

Returns:

code – The generated llvm text IR.

Return type:

str

tvm.contrib.cc

Util to invoke C/C++ compilers in the system.

tvm.contrib.cc.get_cc()

Return the path to the default C/C++ compiler.

Returns:

out – The path to the default C/C++ compiler, or None if none was found.

Return type:

Optional[str]

tvm.contrib.cc.create_shared(output, objects, options=None, cc=None, cwd=None, ccache_env=None)

Create shared library.

Parameters:
  • output (str) – The target shared library.

  • objects (List[str]) – List of object files.

  • options (List[str]) – The list of additional options string.

  • cc (Optional[str]) – The compiler command.

  • cwd (Optional[str]) – The current working directory.

  • ccache_env (Optional[Dict[str, str]]) – The environment variable for ccache. Set None to disable ccache by default.

tvm.contrib.cc.create_staticlib(output, inputs, ar=None)

Create static library.

Parameters:
  • output (str) – The target shared library.

  • inputs (List[str]) – List of inputs files. Each input file can be a tarball of objects or an object file.

  • ar (Optional[str]) – Path to the ar command to be used

tvm.contrib.cc.create_executable(output, objects, options=None, cc=None, cwd=None, ccache_env=None)

Create executable binary.

Parameters:
  • output (str) – The target executable.

  • objects (List[str]) – List of object files.

  • options (List[str]) – The list of additional options string.

  • cc (Optional[str]) – The compiler command.

  • cwd (Optional[str]) – The urrent working directory.

  • ccache_env (Optional[Dict[str, str]]) – The environment variable for ccache. Set None to disable ccache by default.

tvm.contrib.cc.get_global_symbol_section_map(path, *, nm=None) dict[str, str]

Get global symbols from a library via nm -g

Parameters:
  • path (str) – The library path

  • nm (str) – The path to nm command

Returns:

symbol_section_map – A map from defined global symbol to their sections

Return type:

Dict[str, str]

tvm.contrib.cc.get_target_by_dump_machine(compiler)

Functor of get_target_triple that can get the target triple using compiler.

Parameters:

compiler (Optional[str]) – The compiler.

Returns:

out – A function that can get target triple according to dumpmachine option of compiler.

Return type:

Callable

tvm.contrib.cc.cross_compiler(compile_func, options=None, output_format=None, get_target_triple=None, add_files=None)

Create a cross compiler function by specializing compile_func with options.

This function can be used to construct compile functions that can be passed to AutoTVM measure or export_library.

Parameters:
  • compile_func (Union[str, Callable[[str, str, Optional[str]], None]]) – Function that performs the actual compilation

  • options (Optional[List[str]]) – List of additional optional string.

  • output_format (Optional[str]) – Library output format.

  • get_target_triple (Optional[Callable]) – Function that can target triple according to dumpmachine option of compiler.

  • add_files (Optional[List[str]]) – List of paths to additional object, source, library files to pass as part of the compilation.

Returns:

fcompile – A compilation function that can be passed to export_library.

Return type:

Callable[[str, str, Optional[str]], None]

Examples

from tvm.contrib import cc, ndk
# export using arm gcc
mod = build_runtime_module()
mod.export_library(path_dso,
                   fcompile=cc.cross_compiler("arm-linux-gnueabihf-gcc"))
# specialize ndk compilation options.
specialized_ndk = cc.cross_compiler(
    ndk.create_shared,
    ["--sysroot=/path/to/sysroot", "-shared", "-fPIC", "-lm"])
mod.export_library(path_dso, fcompile=specialized_ndk)

tvm.contrib.coreml_runtime

CoreML runtime that load and run coreml models.

tvm.contrib.coreml_runtime.create(symbol, compiled_model_path, device)

Create a runtime executor module given a coreml model and context.

Parameters:
  • symbol (str) – The symbol that represents the Core ML model.

  • compiled_model_path (str) – The path of the compiled model to be deployed.

  • device (Device) – The device to deploy the module. It can be local or remote when there is only one Device.

Returns:

coreml_runtime – Runtime coreml module that can be used to execute the coreml model.

Return type:

CoreMLModule

class tvm.contrib.coreml_runtime.CoreMLModule(module)

Wrapper runtime module.

This is a thin wrapper of the underlying TVM module. you can also directly call set_input, run, and get_output of underlying module functions

Parameters:

module (Module) – The internal tvm module that holds the actual coreml functions.

module

The internal tvm module that holds the actual coreml functions.

Type:

Module

tvm.contrib.cublas

External function interface to cuBLAS libraries.

tvm.contrib.cublas.matmul(lhs, rhs, transa=False, transb=False, dtype=None)

Create an extern op that compute matrix mult of A and rhs with cuBLAS

Parameters:
  • lhs (Tensor) – The left matrix operand

  • rhs (Tensor) – The right matrix operand

  • transa (bool) – Whether transpose lhs

  • transb (bool) – Whether transpose rhs

Returns:

C – The result tensor.

Return type:

Tensor

tvm.contrib.cublas.batch_matmul(lhs, rhs, transa=False, transb=False, dtype=None)

Create an extern op that compute batch matrix mult of A and rhs with cuBLAS

Parameters:
  • lhs (Tensor) – The left matrix operand

  • rhs (Tensor) – The right matrix operand

  • transa (bool) – Whether transpose lhs

  • transb (bool) – Whether transpose rhs

Returns:

C – The result tensor.

Return type:

Tensor

tvm.contrib.cublaslt

External function interface to cuBLASlt libraries.

tvm.contrib.cublaslt.matmul(lhs, rhs, transa=False, transb=False, n=0, m=0, dtype=None)

Create an extern op that compute matrix mult of A and rhs with cuBLAS

Parameters:
  • lhs (Tensor) – The left matrix operand

  • rhs (Tensor) – The right matrix operand

  • transa (bool) – Whether transpose lhs

  • transb (bool) – Whether transpose rhs

Returns:

C – The result tensor.

Return type:

Tensor

tvm.contrib.cudnn

External function interface to CuDNN v7 library.

tvm.contrib.cudnn.exists()

Checks whether the local machine can use CuDNN.

Returns:

exists – True if CuDNN support is enabled and a CuDNN-capable GPU exists. Otherwise, False.

Return type:

bool

tvm.contrib.cudnn.algo_to_index(algo_type, algo_name)

Return a index represents the algorithm, which can be used in calling CuDNN function

Parameters:
  • algo_type (str) – One of "fwd", "bwd_filter", or "bwd_data".

  • algo_name (str) –

    Algorithm name as defined in cuDNN. For example:

    • fwd: CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM, etc.

    • bwd_filter: CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0, etc.

    • bwd_data: CUDNN_CONVOLUTION_BWD_DATA_ALGO_0, etc.

Returns:

algo – Algorithm index

Return type:

int

tvm.contrib.cudnn.conv_output_shape(tensor_format, pad, stride, dilation, x_shape, w_shape, data_dtype, conv_dtype, groups=1)

Get output shape of 2D or 3D convolution

Paramters

tensor_format: int

0: CUDNN_TENSOR_NCHW 1: CUDNN_TENSOR_NHWC 2: CUDNN_TENSOR_NCHW_VECT_C

pad: int or list

padding

stride: int or list

stride

dilation: int or list

dilation

x_shape: list

input shape

w_shape: list

weight shape

data_dtype: str

data type

conv_dtype: str

convolution type

groups: int

number of groups

returns:

oshape – output shape

rtype:

list

tvm.contrib.cudnn.conv_dgrad_shape(tensor_format, pad, stride, dilation, dy_shape, w_shape, output_padding=(0, 0), groups=1)

Get output shape of conv2d gradient with respect to data

Paramters

tensor_format: int

0: CUDNN_TENSOR_NCHW 1: CUDNN_TENSOR_NHWC

pad: int or list

padding

stride: int or list

stride

dilation: int or list

dilation

dy_shape: list

output gradient shape

w_shape: list

weight shape

data_dtype: str

data type

conv_dtype: str

convolution type

groups: int

number of groups

returns:

oshape – output shape

rtype:

list

tvm.contrib.cudnn.conv_forward_find_algo(tensor_format, pad, stride, dilation, x_shape, w_shape, y_shape, data_dtype, conv_dtype, groups=1, verbose=True)

Choose the best forward algorithm for the given input.

Paramters

tensor_format: int

0: CUDNN_TENSOR_NCHW 1: CUDNN_TENSOR_NHWC 2: CUDNN_TENSOR_NCHW_VECT_C

pad: int or list

padding

stride: int or list

stride

dilation: int or list

dilation

x_shape: list

input shape

w_shape: list

weight shape

y_shape: list

output shape

data_dtype: str

data type

conv_dtype: str

convolution type

groups: int

number of groups

returns:

algo – algo chosen by CUDNN

rtype:

int

tvm.contrib.cudnn.conv_backward_data_find_algo(tensor_format, pad, stride, dilation, dy_shape, w_shape, dx_shape, data_dtype, conv_dtype, groups=1, verbose=True)

Choose the best backward data algorithm for the given input.

Paramters

tensor_format: int

0: CUDNN_TENSOR_NCHW 1: CUDNN_TENSOR_NHWC 2: CUDNN_TENSOR_NCHW_VECT_C

pad: int or list

padding

stride: int or list

stride

dilation: int or list

dilation

dy_shape: list

output gradient shape

w_shape: list

weight shape

dx_shape: list

dgrad shape

data_dtype: str

data type

conv_dtype: str

convolution type

groups: int

number of groups

verbose: bool

whether to show the selection trials

returns:

algo – algo chosen by CUDNN

rtype:

int

tvm.contrib.cudnn.conv_backward_filter_find_algo(tensor_format, pad, stride, dilation, dy_shape, x_shape, dw_shape, data_dtype, conv_dtype, groups=1, verbose=True)

Choose the best backward filter algorithm for the given input.

Paramters

tensor_format: int

0: CUDNN_TENSOR_NCHW 1: CUDNN_TENSOR_NHWC 2: CUDNN_TENSOR_NCHW_VECT_C

pad: int or list

padding

stride: int or list

stride

dilation: int or list

dilation

dy_shape: list

output gradient shape

x_shape: list

weight shape

dw_shape: list

wgrad shape

data_dtype: str

data type

conv_dtype: str

convolution type

groups: int

number of groups

verbose: bool

whether to show the selection trials

returns:

algo – algo chosen by CUDNN

rtype:

int

tvm.contrib.cudnn.conv_forward(x, w, pad, stride, dilation, conv_mode, tensor_format, algo, conv_dtype, groups=1, verbose=True)

Create an extern op that compute 2D or 3D convolution with CuDNN

Parameters:
  • x (Tensor) – input feature map

  • w (Tensor) – convolution weight

  • pad (int or list) – padding

  • stride (int or list) – stride

  • dilation (int or list) – dilation

  • conv_mode (int) – 0: CUDNN_CONVOLUTION 1: CUDNN_CROSS_CORRELATION

  • tensor_format (int) – 0: CUDNN_TENSOR_NCHW 1: CUDNN_TENSOR_NHWC 2: CUDNN_TENSOR_NCHW_VECT_C

  • algo (int) – Forward algorithm, get index from `algo_to_index` function if algo == -1, the best algo will be chosen by CUDNN

  • conv_dtype (str) – convolution type

  • groups (int) – the number of groups

  • verbose (bool) – whether to show the selection trials

Returns:

y – The result tensor

Return type:

Tensor

tvm.contrib.cudnn.conv_backward_data(dy, w, pad, stride, dilation, conv_mode, tensor_format, conv_dtype, groups=1, output_padding=(0, 0))

Create a CuDNN extern op that computes the gradient of 2D convolution with respect to data.

Parameters:
  • dy (Tensor) – output gradient

  • w (Tensor) – convolution weight

  • pad (int or list) – padding

  • stride (int or list) – stride

  • dilation (int or list) – dilation

  • conv_mode (int) – 0: CUDNN_CONVOLUTION 1: CUDNN_CROSS_CORRELATION

  • tensor_format (int) – 0: CUDNN_TENSOR_NCHW 1: CUDNN_TENSOR_NHWC

  • conv_dtype (str) – convolution type

  • groups (int) – the number of groups

Returns:

dx – dgrad tensor

Return type:

Tensor

tvm.contrib.cudnn.conv_backward_filter(dy, x, kernel_size, pad, stride, dilation, conv_mode, tensor_format, conv_dtype, groups=1)

Create a CuDNN extern op that computes the gradient of 2D convolution with respect to weight.

Parameters:
  • dy (Tensor) – output gradient

  • x (Tensor) – input tensor

  • kernel_size (a pair of int) – The spatial size of the corresponding forward convolution kernel

  • pad (int or list) – padding

  • stride (int or list) – stride

  • dilation (int or list) – dilation

  • conv_mode (int) – 0: CUDNN_CONVOLUTION 1: CUDNN_CROSS_CORRELATION

  • tensor_format (int) – 0: CUDNN_TENSOR_NCHW 1: CUDNN_TENSOR_NHWC

  • conv_dtype (str) – convolution type

  • groups (int) – the number of groups

Returns:

dw – wgrad tensor

Return type:

Tensor

tvm.contrib.cudnn.softmax(x, axis=-1)

Compute softmax using CuDNN

Parameters:
  • x (tvm.te.Tensor) – The input tensor

  • axis (int) – The axis to compute the softmax

Returns:

ret – The result tensor

Return type:

tvm.te.Tensor

tvm.contrib.cudnn.log_softmax(x, axis=-1)

Compute log_softmax using CuDNN

Parameters:
  • x (tvm.te.Tensor) – The input tensor

  • axis (int) – The axis to compute log softmax over

Returns:

ret – The result tensor

Return type:

tvm.te.Tensor

tvm.contrib.dlpack

Wrapping functions to bridge frameworks with DLPack support to TVM

tvm.contrib.dlpack.convert_func(tvm_func, tensor_type, to_dlpack_func)
Convert a tvm function into one that accepts a tensor from another

framework, provided the other framework supports DLPACK

Parameters:
  • tvm_func (Function) – Built tvm function operating on arrays

  • tensor_type (Type) – Type of the tensors of the target framework

  • to_dlpack_func (Function) – Function to convert the source tensors to DLPACK

tvm.contrib.dlpack.to_pytorch_func(tvm_func)

Convert a tvm function into one that accepts PyTorch tensors

Parameters:

tvm_func (Function) – Built tvm function operating on arrays

Returns:

wrapped_func – Wrapped tvm function that operates on PyTorch tensors

Return type:

Function

tvm.contrib.dnnl

External function interface to BLAS libraries.

tvm.contrib.dnnl.matmul(lhs, rhs, transa=False, transb=False, **kwargs)

Create an extern op that compute matrix mult of A and rhs with CrhsLAS This function serves as an example on how to call external libraries.

Parameters:
  • lhs (Tensor) – The left matrix operand

  • rhs (Tensor) – The right matrix operand

  • transa (bool) – Whether transpose lhs

  • transb (bool) – Whether transpose rhs

Returns:

C – The result tensor.

Return type:

Tensor

tvm.contrib.dnnl.dnnl_conv2d(src, weights, stride, padding, dilation, groups, channel_last=False, out_dtype='float32', **kwargs)

Convolution operator in NCHW layout.

Parameters:
  • src (tvm.te.Tensor) – 4-D with shape [batch, in_channel, in_height, in_width]

  • weights (tvm.te.Tensor) – 4-D with shape [num_filter, in_channel, filter_height, filter_width]

  • stride (int or a list/tuple of two ints) – Stride size, or [stride_height, stride_width]

  • padding (int or a list/tuple of 2 or 4 ints) – padding size, or [pad_height, pad_width] for 2 ints, or [pad_top, pad_left, pad_bottom, pad_right] for 4 ints

  • dilation (int or a list/tuple of two ints) – dilation size, or [dilation_height, dilation_width]

  • groups (str) – input data layout: NCHW or NHWC

  • channel_last (bool) – chose if input/output data format is in channel_last format(NHWC) or in plain format(NCHW)

  • out_dtype (str) – output datatype: now only support float32

Returns:

Output – 4-D with shape [batch, out_channel, out_height, out_width]

Return type:

tvm.te.Tensor

tvm.contrib.download

Helper utility for downloading

tvm.contrib.download.download(url, path, overwrite=False, size_compare=False, retries=3)

Downloads the file from the internet. Set the input options correctly to overwrite or do the size comparison

Parameters:
  • url (str) – Download url.

  • path (str) – Local file path to save downloaded file.

  • overwrite (bool, optional) – Whether to overwrite existing file, defaults to False.

  • size_compare (bool, optional) – Whether to do size compare to check downloaded file, defaults to False

  • retries (int, optional) – Number of time to retry download, defaults to 3.

tvm.contrib.download.download_testdata(url, relpath, module=None, overwrite=False)

Downloads the test data from the internet.

Parameters:
  • url (str) – Download url.

  • relpath (str) – Relative file path.

  • module (Union[str, list, tuple], optional) – Subdirectory paths under test data folder.

  • overwrite (bool, defaults to False) – If True, will download a fresh copy of the file regardless of the cache. If False, will only download the file if a cached version is missing.

Returns:

abspath – Absolute file path of downloaded file

Return type:

str

tvm.contrib.emcc

Util to invoke emscripten compilers in the system.

tvm.contrib.emcc.create_tvmjs_wasm(output, objects, options=None, cc='emcc', libs=None)

Create wasm that is supposed to run with the tvmjs.

Parameters:
  • output (str) – The target shared library.

  • objects (list) – List of object files.

  • options (str) – The additional options.

  • cc (str, optional) – The compile string.

  • libs (list) – List of user-defined library files (e.g. .bc files) to add into the wasm.

tvm.contrib.hipblas

External function interface to hipBLAS libraries.

tvm.contrib.hipblas.matmul(lhs, rhs, transa=False, transb=False, dtype=None)

Create an extern op that compute matrix mult of A and rhs with cuBLAS

Parameters:
  • lhs (Tensor) – The left matrix operand

  • rhs (Tensor) – The right matrix operand

  • transa (bool) – Whether transpose lhs

  • transb (bool) – Whether transpose rhs

Returns:

C – The result tensor.

Return type:

Tensor

tvm.contrib.hipblas.batch_matmul(lhs, rhs, transa=False, transb=False, dtype=None)

Create an extern op that compute batch matrix mult of A and rhs with cuBLAS

Parameters:
  • lhs (Tensor) – The left matrix operand

  • rhs (Tensor) – The right matrix operand

  • transa (bool) – Whether transpose lhs

  • transb (bool) – Whether transpose rhs

Returns:

C – The result tensor.

Return type:

Tensor

tvm.contrib.mkl

External function interface to BLAS libraries.

tvm.contrib.mkl.matmul(lhs, rhs, transa=False, transb=False, **kwargs)

Create an extern op that compute matrix mult of A and rhs with CrhsLAS This function serves as an example on how to call external libraries.

Parameters:
  • lhs (Tensor) – The left matrix operand

  • rhs (Tensor) – The right matrix operand

  • transa (bool) – Whether transpose lhs

  • transb (bool) – Whether transpose rhs

Returns:

C – The result tensor.

Return type:

Tensor

tvm.contrib.mkl.matmul_u8s8s32(lhs, rhs, transa=False, transb=False, **kwargs)

Create an extern op that compute matrix mult of A and rhs with CrhsLAS This function serves as an example on how to call external libraries.

Parameters:
  • lhs (Tensor) – The left matrix operand

  • rhs (Tensor) – The right matrix operand

  • transa (bool) – Whether transpose lhs

  • transb (bool) – Whether transpose rhs

Returns:

C – The result tensor.

Return type:

Tensor

tvm.contrib.mkl.batch_matmul(lhs, rhs, transa=False, transb=False, iterative=False, **kwargs)

Create an extern op that compute batched matrix mult of A and rhs with mkl This function serves as an example on how to call external libraries.

Parameters:
  • lhs (Tensor) – The left matrix operand

  • rhs (Tensor) – The right matrix operand

  • transa (bool) – Whether transpose lhs

  • transb (bool) – Whether transpose rhs

Returns:

C – The result tensor.

Return type:

Tensor

tvm.contrib.ndk

Util to invoke NDK compiler toolchain.

tvm.contrib.ndk.create_shared(output, objects, options=None)

Create shared library.

Parameters:
  • output (str) – The target shared library.

  • objects (list) – List of object files.

  • options (list of str, optional) – The additional options.

tvm.contrib.ndk.create_staticlib(output, inputs)

Create static library:

Parameters:
  • output (str) – The target static library.

  • inputs (list) – List of object files or tar files

tvm.contrib.ndk.get_global_symbol_section_map(path, *, nm=None) dict[str, str]

Get global symbols from a library via nm -gU in NDK

Parameters:
  • path (str) – The library path

  • nm (str) – The path to nm command

Returns:

symbol_section_map – A map from defined global symbol to their sections

Return type:

Dict[str, str]

tvm.contrib.nnpack

External function interface to NNPACK libraries.

tvm.contrib.nnpack.is_available()

Check whether NNPACK is available, that is, nnp_initialize() returns nnp_status_success.

tvm.contrib.nnpack.fully_connected_inference(lhs, rhs, nthreads=1)

Create an extern op that compute fully connected of 1D tensor lhs and 2D tensor rhs with nnpack.

Parameters:
  • lhs (Tensor) – lhs 1D array input[input_channels] of FP32 elements

  • rhs (Tensor) – lhs 2D matrix kernel[output_channels][input_channels] of FP32 elements

Returns:

C – lhs 1D array out[output_channels] of FP32 elements.

Return type:

Tensor

tvm.contrib.nnpack.convolution_inference(data, kernel, bias, padding, stride, nthreads=1, algorithm=0)

Create an extern op to do inference convolution of 4D tensor data and 4D tensor kernel and 1D tensor bias with nnpack.

Parameters:
  • data (Tensor) – data 4D tensor input[batch][input_channels][input_height][input_width] of FP32 elements.

  • kernel (Tensor) – kernel 4D tensor kernel[output_channels][input_channels][kernel_height] [kernel_width] of FP32 elements.

  • bias (Tensor) – bias 1D array bias[output_channels][input_channels][kernel_height] [kernel_width] of FP32 elements.

  • padding (list) – padding A 4-dim list of [pad_top, pad_bottom, pad_left, pad_right], which indicates the padding around the feature map.

  • stride (list) – stride A 2-dim list of [stride_height, stride_width], which indicates the stride.

Returns:

output – output 4D tensor output[batch][output_channels][output_height][output_width] of FP32 elements.

Return type:

Tensor

tvm.contrib.nnpack.convolution_inference_without_weight_transform(data, transformed_kernel, bias, padding, stride, nthreads=1, algorithm=0)

Create an extern op to do inference convolution of 4D tensor data and 4D pre-transformed tensor kernel and 1D tensor bias with nnpack.

Parameters:
  • data (Tensor) – data 4D tensor input[batch][input_channels][input_height][input_width] of FP32 elements.

  • transformed_kernel (Tensor) – transformed_kernel 4D tensor kernel[output_channels][input_channels][tile] [tile] of FP32 elements.

  • bias (Tensor) – bias 1D array bias[output_channels][input_channels][kernel_height] [kernel_width] of FP32 elements.

  • padding (list) – padding A 4-dim list of [pad_top, pad_bottom, pad_left, pad_right], which indicates the padding around the feature map.

  • stride (list) – stride A 2-dim list of [stride_height, stride_width], which indicates the stride.

Returns:

output – output 4D tensor output[batch][output_channels][output_height][output_width] of FP32 elements.

Return type:

Tensor

tvm.contrib.nnpack.convolution_inference_weight_transform(kernel, nthreads=1, algorithm=0, dtype='float32')

Create an extern op to do inference convolution of 3D tensor data and 4D tensor kernel and 1D tensor bias with nnpack.

Parameters:

kernel (Tensor) – kernel 4D tensor kernel[output_channels][input_channels][kernel_height] [kernel_width] of FP32 elements.

Returns:

output – output 4D tensor output[output_channels][input_channels][tile][tile] of FP32 elements.

Return type:

Tensor

tvm.contrib.nvcc

Utility to invoke nvcc compiler in the system

tvm.contrib.nvcc.compile_cuda(code, target_format=None, arch=None, options=None, path_target=None, compiler='nvcc')

Compile CUDA code with NVCC or NVRTC.

Parameters:
  • code (str) – The CUDA code.

  • target_format (str) – The target format of the compiler (“ptx”, “cubin”, or “fatbin”).

  • arch (str) – The CUDA architecture.

  • options (str or list of str) – The additional options.

  • path_target (str, optional) – Output file.

  • compiler (str, optional) – Compiler backend: “nvcc” or “nvrtc”. This can be set by the TVM_CUDA_COMPILE_MODE environment variable.

Returns:

res_binary – The bytearray of the compiled binary (ptx/cubin/fatbin).

Return type:

bytearray

Notes

  • NVRTC is a “runtime” compilation library and can be faster for JIT compilation.

  • NVRTC requires cuda-python: pip install cuda-python

tvm.contrib.nvcc.find_cuda_path()

Utility function to find CUDA path

Returns:

path – Path to CUDA root.

Return type:

str

tvm.contrib.nvcc.get_cuda_version(cuda_path=None)

Utility function to get CUDA version

Parameters:

cuda_path (Optional[str]) – Path to CUDA root. If None is passed, will use find_cuda_path() as default.

Returns:

version – The CUDA version

Return type:

float

tvm.contrib.nvcc.find_nvshmem_paths() tuple[str, str]

Searches for the NVSHMEM include and library directories.

Return type:

A tuple containing the path to the include directory and the library directory.

tvm.contrib.nvcc.parse_compute_version(compute_version)

Parse compute capability string to divide major and minor version

Parameters:

compute_version (str) – compute capability of a GPU (e.g. “6.0”)

Returns:

  • major (int) – major version number

  • minor (int) – minor version number

tvm.contrib.nvcc.have_fp16(compute_version)

Either fp16 support is provided in the compute capability or not

Parameters:

compute_version (str) – compute capability of a GPU (e.g. “6.0”)

tvm.contrib.nvcc.have_int8(compute_version)

Either int8 support is provided in the compute capability or not

Parameters:

compute_version (str) – compute capability of a GPU (e.g. “6.1”)

tvm.contrib.nvcc.have_tensorcore(compute_version=None, target=None)

Either TensorCore support is provided in the compute capability or not

Parameters:
  • compute_version (str, optional) – compute capability of a GPU (e.g. “7.0”).

  • target (tvm.target.Target, optional) – The compilation target, will be used to determine arch if compute_version isn’t specified.

tvm.contrib.nvcc.have_cudagraph()

Either CUDA Graph support is provided

tvm.contrib.pickle_memoize

Memoize result of function via pickle, used for cache testcases.

class tvm.contrib.pickle_memoize.Cache(key, save_at_exit)

A cache object for result cache.

Parameters:
  • key (str) – The file key to the function

  • save_at_exit (bool) – Whether save the cache to file when the program exits

property cache

Return the cache, initializing on first use.

tvm.contrib.pickle_memoize.memoize(key, save_at_exit=False)

Memoize the result of function and reuse multiple times.

Parameters:
  • key (str) – The unique key to the file

  • save_at_exit (bool) – Whether save the cache to file when the program exits

Returns:

fmemoize – The decorator function to perform memoization.

Return type:

function

tvm.contrib.popen_pool

Multiprocessing via Popen.

This module provides a multi-processing pool backed by Popen. with additional timeout support.

tvm.contrib.popen_pool.kill_child_processes(pid)

Kill all child processes recursively for a given pid.

Parameters:

pid (int) – The given parameter id.

class tvm.contrib.popen_pool.StatusKind(value)

Running and return value status.

class tvm.contrib.popen_pool.MapResult(status, value)

Result of map_with_error_catching.

Parameters:
  • status (StatusKind) – The status of the result.

  • value (Any) – The result value.

class tvm.contrib.popen_pool.PopenWorker(initializer=None, initargs=(), maximum_uses=None, stdout=None, stderr=None)

A subprocess worker via Popen.

PopenWorker provides a low-level API to interact with a separate process via Popen.

Parameters:
  • initializer (callable or None) – A callable initializer, or None

  • initargs (Tuple[object]) – A tuple of args for the initializer

  • maximum_uses (Optional[int]) – The maximum number of times a process can be used before being recycled, i.e. killed and restarted. If None, the process will be reused until an operation times out.

  • stdout (Union[None, int, IO[Any]]) – The standard output streams handler specified for the popen process.

  • stderr (Union[None, int, IO[Any]]) – The standard error streams handler specified for the popen process.

kill()

Kill the current running process and cleanup.

Note

The worker can start a new process when send is called again.

join(timeout=None)

Join the current process worker before it terminates.

Parameters:

timeout (Optional[number]) – Timeout value, block at most timeout seconds if it is a positive number.

is_alive()

Check if the process is alive

send(fn, args=(), kwargs=None, timeout=None)

Send a new function task fn(*args, **kwargs) to the subprocess.

Parameters:
  • fn (function) – The function to be invoked.

  • args (list) – Positional argument.

  • kwargs (dict) – Keyword arguments

  • timeout (float) – Timeout value when executing the function

Note

The caller must call recv before calling the next send in order to make sure the timeout and child process exit won’t affect the later requests.

recv()

Receive the result of the last send.

Returns:

result – The result of the last send.

Return type:

object

Raises:
class tvm.contrib.popen_pool.PopenPoolExecutor(max_workers=None, timeout=None, initializer=None, initargs=(), maximum_process_uses=None, stdout=None, stderr=None)

An parallel executor backed by Popen processes.

Parameters:
  • max_worker (int) – Maximum number of workers

  • timeout (float) – Timeout value for each function submit.

  • initializer (callable or None) – A callable initializer, or None

  • initargs (Tuple[object]) – A tuple of args for the initializer

  • maximum_process_uses (Optional[int]) – The maximum number of times each process can be used before being recycled, i.e. killed and restarted. If None, processes will be reused until an operation times out.

  • stdout (Union[None, int, IO[Any]]) – The standard output streams handler specified for the workers in the pool.

  • stderr (Union[None, int, IO[Any]]) – The standard error streams handler specified for the workers in the pool.

Note

If max_workers is NONE then the number returned by os.cpu_count() is used. This method aligns with the behavior of multiprocessing.pool().

shutdown(wait=True)

Shutdown the executor and clean up resources.

Parameters:

wait (bool) – If True, wait for pending work to complete.

Note

DEADLOCK WARNING: This method can deadlock when called during garbage collection due to exception reference cycles. When exceptions occur, Python creates reference cycles that delay garbage collection. The deadlock happens when: exception creates reference cycle → new pool creates worker → GC cleans old pool → old pool’s __del__ calls shutdown() which tries to acquire locks again.

submit(fn, *args, **kwargs) Future

Submit a new function job to the pool

Parameters:
  • fn (function) – The function to be invoked.

  • args (list) – Positional argument.

  • kwargs (dict) – Keyword arguments

Returns:

future – A future that can be used to access the result.

Return type:

concurrent.futures.Future

map_with_error_catching(fn, iterator)

Same as map, but catches exceptions and return them instead.

Parameters:
  • fn (function) – The function to be invoked.

  • iterator (Iterator) – Input iterator.

Returns:

out_iter – The result iterator.

Return type:

Iterator[MapResult]

tvm.contrib.random

External function interface to random library.

tvm.contrib.random.randint(low, high, size, dtype='int32')

Return random integers from low (inclusive) to high (exclusive). Return random integers from the “discrete uniform” distribution of the specified dtype in the “half-open” interval [low, high).

Parameters:
  • low (int) – Lowest (signed) integer to be drawn from the distribution

  • high (int) – One above the largest (signed) integer to be drawn from the distribution

Returns:

out – A tensor with specified size and dtype

Return type:

Tensor

tvm.contrib.random.uniform(low, high, size)

Draw samples from a uniform distribution.

Samples are uniformly distributed over the half-open interval [low, high) (includes low, but excludes high). In other words, any value within the given interval is equally likely to be drawn by uniform.

Parameters:
  • low (float) – Lower boundary of the output interval. All values generated will be greater than or equal to low.

  • high (float) – Upper boundary of the output interval. All values generated will be less than high.

  • size (tuple of ints) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn.

Returns:

out – A tensor with specified size and dtype.

Return type:

Tensor

tvm.contrib.random.normal(loc, scale, size)

Draw samples from a normal distribution.

Return random samples from a normal distribution.

Parameters:
  • loc (float) – loc of the distribution.

  • scale (float) – Standard deviation of the distribution.

  • size (tuple of ints) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn.

Returns:

out – A tensor with specified size and dtype

Return type:

Tensor

tvm.contrib.rocm

Utility for ROCm backend

tvm.contrib.rocm.find_lld(required=True)

Find ld.lld in system.

Parameters:

required (bool) – Whether it is required, runtime error will be raised if the compiler is required.

Returns:

valid_list – List of possible paths.

Return type:

list of str

Note

This function will first search ld.lld that matches the major llvm version that built with tvm

Link relocatable ELF object to shared ELF object using lld

Parameters:
  • in_file (str) – Input file name (relocatable ELF object file)

  • out_file (str) – Output file name (shared ELF object file)

  • lld (str, optional) – The lld linker, if not specified, we will try to guess the matched clang version.

tvm.contrib.rocm.parse_compute_version(compute_version)

Parse compute capability string to divide major and minor version

Parameters:

compute_version (str) – compute capability of a GPU (e.g. “6.0”)

Returns:

  • major (int) – major version number

  • minor (int) – minor version number

tvm.contrib.rocm.have_matrixcore(compute_version=None)

Either MatrixCore support is provided in the compute capability or not

Parameters:

compute_version (str, optional) – compute capability of a GPU (e.g. “7.0”).

Returns:

have_matrixcore – True if MatrixCore support is provided, False otherwise

Return type:

bool

tvm.contrib.rocm.find_rocm_path()

Utility function to find ROCm path

Returns:

path – Path to ROCm root.

Return type:

str

tvm.contrib.spirv

Utility for Interacting with SPIRV Tools

tvm.contrib.spirv.optimize(spv_bin)

Optimize SPIRV using spirv-opt via CLI

Note that the spirv-opt is still experimental.

Parameters:

spv_bin (bytearray) – The spirv file

Returns:

cobj_bin – The HSA Code Object

Return type:

bytearray

tvm.contrib.tar

Util to invoke tarball in the system.

tvm.contrib.tar.tar(output, files)

Create tarball containing all files in root.

Parameters:
  • output (str) – The target shared library.

  • files (list) – List of files to be bundled.

tvm.contrib.tar.untar(tar_file, directory)

Unpack all tar files into the directory

Parameters:
  • tar_file (str) – The source tar file.

  • directory (str) – The target directory

tvm.contrib.tar.normalize_file_list_by_unpacking_tars(temp, file_list)

Normalize the file list by unpacking tars in list.

When a filename is a tar, it will untar it into an unique dir in temp and return the list of files in the tar. When a filename is a normal file, it will be simply added to the list.

This is useful to untar objects in tar and then turn them into a library.

Parameters:
Returns:

ret_list – An updated list of files

Return type:

List[str]

tvm.contrib.thrust

Utilities for thrust

tvm.contrib.tvmjs

Namespace to store utilities for building web runtime.

class tvm.contrib.tvmjs.TensorCacheShardingManager(cache_dir: str, prefix: str, shard_cap_nbytes: int, initial_shard_records: Mapping[str, Any] | None = None)

Internal helper to shard ndarrays.

append_or_update(data, name, shape, dtype, encode_format, allow_update: bool = False)

Commit a record to the manager.

Parameters:
  • data (bytes) – Raw bytes to be appended.

  • name (str) – The name of the parameter

  • shape (tuple) – The shape of the array

  • dtype (str) – The dtype information

  • encode_format – The encode format of the entry

  • allow_update (bool) – If the record already exists, update the record. Otherwise, raise an error.

update_single_record(rec, data)

Update a single record in a shard file.

commit()

Commit a record

finish()

Finish building and return shard records.

property pending_nbytes

Return total bytes stored so far

tvm.contrib.tvmjs.dump_tensor_cache(params: Mapping[str, ndarray | Tensor] | Iterator[tuple[str, ndarray | Tensor]], cache_dir: str, encode_format='f32-to-bf16', meta_data=None, shard_cap_mb=32, show_progress: bool = True, update_if_exists: bool = False)

Dump parameters to Tensor cache.

Parameters:
  • params (Union[) – Mapping[str, Union[np.ndarray, tvm.runtime.Tensor]], Iterator[Tuple[str, Union[np.ndarray, tvm.runtime.Tensor]]],

  • ] – The parameter dictionary or generator

  • cache_dir (str) – The path to the cache

  • encode_format ({"f32-to-bf16", "raw"}) – Encoding format.

  • meta_data (json-compatible-struct or Callable[[], Any]) – Extra meta_data to be stored in the cache json file, or a callable that returns the metadata.

  • shard_cap_mb (int) – Maxinum number of MB to be kept per shard

  • show_progress (bool) – A boolean indicating if to show the dump progress.

  • update_if_exists (bool) – If the cache already exists, update the cache. When set to False, it will overwrite the existing files.

tvm.contrib.tvmjs.load_tensor_cache(cachepath: str, device: Device)

Load the tensor cache from the directory or json.

Parameters:
  • cachepath (str) – Path to the location or json file.

  • device (tvm.runtime.Device) – The device we would like to load the data from.

tvm.contrib.tvmjs.export_runtime(runtime_dir)

Export TVMJS runtime to the runtime_dir

Parameters:

runtime_dir (str) – The runtime directory

tvm.contrib.utils

Common system utilities

exception tvm.contrib.utils.DirectoryCreatedPastAtExit

Raised when a TempDirectory is created after the atexit hook runs.

class tvm.contrib.utils.TempDirectory(custom_path=None, keep_for_debug=None)

Helper object to manage temp directory during testing.

Automatically removes the directory when it went out of scope.

classmethod set_keep_for_debug(set_to=True)

Keep temporary directories past program exit for debugging.

remove()

Remove the tmp dir

relpath(name)

Relative path in temp dir

Parameters:

name (str) – The name of the file.

Returns:

path – The concatenated path.

Return type:

str

listdir()

List contents in the dir.

Returns:

names – The content of directory

Return type:

list

tvm.contrib.utils.tempdir(custom_path=None, keep_for_debug=None)

Create temp dir which deletes the contents when exit.

Parameters:
  • custom_path (str, optional) – Manually specify the exact temp dir path

  • keep_for_debug (bool) – Keep temp directory for debugging purposes

Returns:

temp – The temp directory object

Return type:

TempDirectory

class tvm.contrib.utils.FileLock(path)

File lock object

Parameters:

path (str) – The path to the lock

release()

Release the lock

tvm.contrib.utils.filelock(path)

Create a file lock which locks on path

Parameters:

path (str) – The path to the lock

Returns:

lock

Return type:

File lock object

tvm.contrib.utils.is_source_path(path)

Check if path is source code path.

Parameters:

path (str) – A possible path

Returns:

valid – Whether path is a possible source path

Return type:

bool

tvm.contrib.utils.which(exec_name)

Try to find full path of exec_name

Parameters:

exec_name (str) – The executable name

Returns:

path – The full path of executable if found, otherwise returns None

Return type:

str

tvm.contrib.xcode

Utility to invoke Xcode compiler toolchain

tvm.contrib.xcode.xcrun(cmd)

Run xcrun and return the output.

Parameters:

cmd (list of str) – The command sequence.

Returns:

out – The output string.

Return type:

str

tvm.contrib.xcode.create_dylib(output, objects, arch, sdk='macosx', min_os_version=None)

Create dynamic library.

Parameters:
  • output (str) – The target shared library.

  • objects (list) – List of object files.

  • options (str) – The additional options.

  • arch (str) – Target major architectures

  • sdk (str) – The sdk to be used.

tvm.contrib.xcode.compile_metal(code, path_target=None, sdk='macosx', min_os_version=None)

Compile Metal with CLI tool from env.

Parameters:
  • code (str) – The Metal code.

  • path_target (str, optional) – Output file.

  • sdk (str, optional) – The target platform SDK.

Returns:

metallib – The bytearray of the metallib

Return type:

bytearray

tvm.contrib.xcode.compile_coreml(model, model_name='main', out_dir='.')

Compile coreml model and return the compiled model path.

tvm.contrib.cutlass

BYOC support for CUTLASS.

tvm.contrib.cutlass.has_cutlass()

Returns true if the CUTLASS custom codegen is available

tvm.contrib.cutlass.finalize_modules(lib, lib_path='compile.so', tmp_dir='./tmp')

Returns lib with any C source, LLVM and static library modules complied and linked in ready for use by the graph or AOT executors. This method is not specific to CUTLASS, however it does assume nvcc will be used for final compilation and linking. It is provided here for convenience.

Parameters:
  • lib (runtime.Module) – The output from build.

  • lib_path (string) – The path to a shared library which will be generated as the result of the build process.

  • tmp_dir (string) – A temporary directory where intermediate compiled artifacts will be stored.

Returns:

updated_lib – The updated library with all compilation and linking completed.

Return type:

runtime.Module

tvm.contrib.hexagon

Hexagon APIs.

class tvm.contrib.hexagon.ContainerSession(base_image_name: str = '')

Docker container session

Parameters:

base_image_name (str) – Docker image name to use. Empty string means to use default “tlcpack/ci-hexagon” base image.

exec(cmd) str

Execute command inside docker container

get_env(key: str) str

Return env var value from docker container

copy_to(host_file_path: str) str

Upload file to docker container

copy_from(container_file_path: str, host_file_path: str)

Download file from docker container

close()

Close docker container session

tvm.contrib.hexagon.allocate_hexagon_array(dev, tensor_shape=None, dtype=None, data=None, axis_separators=None, mem_scope=None)

Allocate a hexagon array which could be a 2D array on physical memory defined by axis_separators

tvm.contrib.hexagon.create_aot_shared(so_name: str | Path, files, hexagon_arch: str, options=None)

Export Hexagon AOT module.

tvm.contrib.hexagon.create_shared(output, objects, options=None, cc=None, cwd=None, ccache_env=None)

Create shared library.

Parameters:
  • output (str) – The target shared library.

  • objects (List[str]) – List of object files.

  • options (List[str]) – The list of additional options string.

  • cc (Optional[str]) – The compiler command.

  • cwd (Optional[str]) – The current working directory.

  • ccache_env (Optional[Dict[str, str]]) – The environment variable for ccache. Set None to disable ccache by default.

tvm.contrib.hexagon.export_module(module, out_dir, binary_name='test_binary.so')

Export Hexagon shared object to a file.

tvm.contrib.hexagon.hexagon_clang_plus() str

Return path to the Hexagon clang++.

Link Hexagon shared library using docker container with proper tooling.

Parameters:
  • so_name (str) – Name of the shared library file.

  • objs (list[str, tvm.tirx.StringImm])

  • extra_args (dict (str->str) or Map<String,String>) –

    Additional arguments:

    ’hex_arch’ - Hexagon architecture, e.g. v68

Returns:

ret_val – This function returns 0 at the moment.

Return type:

int

tvm.contrib.hexagon.pack_imports(module: Module, is_system_lib: bool, c_symbol_prefix: str, workspace_dir: str)

Create an ELF object file that contains the binary data for the modules imported in module. This is a callback function for use as fpack_imports in export_library.

Parameters:
  • module (tvm.runtime.Module) – Module whose imported modules need to be serialized.

  • is_system_lib (bool) – Flag whether the exported module will be used as a system library.

  • c_symbol_prefix (str) – Prefix to prepend to the blob symbol.

  • workspace_dir (str) – Location for created files.

Returns:

file_name – The name of the created object file.

Return type:

str

tvm.contrib.hexagon.register_global_func(func_name: str | Callable[[...], Any], f: Callable[[...], Any] | None = None, override: bool = False) Any

Register global function.

Parameters:
  • func_name – The function name

  • f – The function to be registered.

  • override – Whether override existing entry.

Returns:

Register function if f is not specified.

Return type:

fregister

Examples

import tvm_ffi

# we can use decorator to register a function
@tvm_ffi.register_global_func("mytest.echo")
def echo(x):
    return x


# After registering, we can get the function by its name
f = tvm_ffi.get_global_func("mytest.echo")
assert f(1) == 1

# we can also directly register a function
tvm_ffi.register_global_func("mytest.add_one", lambda x: x + 1)
f = tvm_ffi.get_global_func("mytest.add_one")
assert f(1) == 2

See also

tvm_ffi.get_global_func(), tvm_ffi.remove_global_func()

tvm.contrib.hexagon.register_linker(f)

Register a function that will return the path to the Hexagon linker.

tvm.contrib.hexagon.toolchain_version(toolchain=None) list[int]

Return the version of the Hexagon toolchain.

Parameters:

toolchain (str, optional) – Path to the Hexagon toolchain. If not provided, the environment variable HEXAGON_TOOLCHAIN is used.

Returns:

version – List of numerical components of the version number. E.g. for version “8.5.06” it will be [8, 5, 6].

Return type:

List[int]