Installation

There are two pieces:

  • the TIRx compiler (tvm.tirx), which ships inside Apache TVM — this is all you need to write and compile kernels;

  • the optional kernel library (tirx-kernels), a set of ready-made GEMM and attention kernels built with TIRx.

Requirements

  • Python ≥ 3.10.

  • An NVIDIA GPU with a recent CUDA toolkit. The bundled kernels target Blackwell (sm_100a); the compiler itself targets GPUs and accelerators more broadly.

Install the TIRx compiler

Install the Apache TVM wheel (the TIRx compiler is the tvm.tirx module):

pip install apache-tvm==0.25.0

Verify:

python -c "import tvm, tvm.tirx; print(tvm.__version__)"

Install the kernel library (optional)

tirx-kernels provides prebuilt kernels (fp16_bf16_gemm, fp8_blockwise_gemm, nvfp4_gemm, flash_attention4). It has no PyPI wheel — install it from source:

git clone https://github.com/mlc-ai/tirx-kernels
cd tirx-kernels
pip install -e .

Its runtime dependencies are not pulled from PyPI and must be available separately (they are imported lazily, so import tirx_kernels and kernel discovery work without them — they are only needed to actually compile/run a kernel):

Dependency

Needed by

Notes

tvm.tirx

all kernels

the TIRx compiler (installed above, or put a source checkout’s python/ on PYTHONPATH)

torch

all kernels

a CUDA build matching your GPU

deep_gemm

fp8_blockwise_gemm

optional — quantization helpers and the reference baseline

flashinfer

nvfp4_gemm

optional — quantization and the baseline

Build from source

To develop TIRx or build the docs, build TVM from source and make it importable. See Install from Source for the full instructions; in short:

export TVM_HOME=/path/to/tvm
export TVM_LIBRARY_PATH=$TVM_HOME/build
export PYTHONPATH=$TVM_HOME/python:$PYTHONPATH
python -c "import tvm, tvm.tirx; print(tvm.__file__)"