Compiler Integration#

TVM FFI is a standard ABI designed as a standalone module that is independent from compiler or intermediate representation implementations. It specifies a runtime ABI that DSL compilers and languages can integrate with.

Kernel Language Compilers#

Kernel languages such as OpenAI Triton, TileLang, Mojo, cuteDSL, Helion, and Hidet usually leverage their own internal compilation mechanisms to build code. To connect these functions to the FFI convention, one can use the following options:

  • For compilers that generate host functions via codegen (e.g., LLVM), one can generate the symbol __tvm_ffi_<func_name>, where <funcname> is the exported function. Optionally, also generate __tvm_ffi__metadata_<func_name> for reflection.

  • For kernel generators that generate C++ host code, use TVM_FFI_DLL_EXPORT_TYPED_FUNC. This macro automatically exports function metadata when TVM_FFI_DLL_EXPORT_INCLUDE_METADATA is set to 1.

  • To export documentation strings, use TVM_FFI_DLL_EXPORT_TYPED_FUNC_DOC separately after exporting the function. This enhances tooling support (stub generation, IDE tooltips). Documentation export is also controlled by TVM_FFI_DLL_EXPORT_INCLUDE_METADATA.

The following code snippet shows C code that corresponds to a function performing add_one_c under the ABI. It is reasonably straightforward for low-level code generators to replicate this C logic. You can run this code as part of the quick start example.

#include <tvm/ffi/c_api.h>
#include <tvm/ffi/extra/c_env_api.h>

// Helper function to extract DLTensor from TVMFFIAny (can be inlined into generated code)
int ReadDLTensorPtr(const TVMFFIAny *value, DLTensor** out) {
  if (value->type_index == kTVMFFIDLTensorPtr) {
    *out = (DLTensor*)(value->v_ptr);
    return 0;
  }
  if (value->type_index != kTVMFFITensor) {
    // Use TVMFFIErrorSetRaisedFromCStr / TVMFFIErrorSetRaisedFromCStrParts to set an error which will
    // be propagated to the caller
    TVMFFIErrorSetRaisedFromCStr("ValueError", "Expects a Tensor input");
    return -1;
  }
  *out = (DLTensor*)((char*)(value->v_obj) + sizeof(TVMFFIObject));
  return 0;
}

// FFI function implementing add_one operation
int __tvm_ffi_add_one_c(
  void* handle, const TVMFFIAny* args, int32_t num_args, TVMFFIAny* result
) {
  DLTensor *x, *y;
  // Extract tensor arguments
  // return -1 for error, error is set through TVMFFIErrorSetRaisedFromCStr
  if (ReadDLTensorPtr(&args[0], &x) == -1) return -1;
  if (ReadDLTensorPtr(&args[1], &y) == -1) return -1;

  // Get current stream for device synchronization (e.g., CUDA)
  // not needed for CPU, just keep here for demonstration purpose
  void* stream = TVMFFIEnvGetStream(x->device.device_type, x->device.device_id);

  // perform the actual operation
  for (int i = 0; i < x->shape[0]; ++i) {
    ((float*)(y->data))[i] = ((float*)(x->data))[i] + 1;
  }
  // return 0 for success run
  return 0;
}

Some of the key takeaways include:

You can also check out the ABI overview for a more complete guide.

Graph Compilers#

Machine learning graph compilers take computational graphs and can integrate with TVM FFI through:

  • Supporting the call_tvm_ffi primitive that calls into my_func that follows the ABI:

    Op.call_tvm_ffi("my_func", *args)
    
  • Using the module API to load the modules into context and run. Alternatively, look up global functions that are registered and invoke them.

  • For ahead-of-time compilation (AOT) with minimum runtime, the AOT compiler can generate direct calls into FFI functions:

    • Use the TVMFFIFunctionCall API to call into custom tvm::ffi::Functions

    • If the function exposes a C symbol following the FFI ABI, call it directly.

This approach provides a unified mechanism to call into any libraries and other DSLs that expose kernels following the FFI convention, enabling seamless interoperability with various kernel DSLs and libraries.

Runtime and State Management for Compilers#

While TVM FFI provides a standard ABI for compiler-generated kernels, many compilers and domain-specific languages (DSLs) require their own runtime to manage states like dynamic shapes, workspace memory, or other application-specific data. This runtime can be a separate shared library accessible to all kernels from a specific compiler.

Distributing the Runtime#

For a user to use a kernel from your compiler, they must have access to your runtime library. The preferred method is to package the runtime shared library (e.g., libmylang_runtime.so) as part of a Python or C++ package. Users must install and import this package before loading any kernels compiled by your system. This approach ensures the state is shared among different kernels.

Common vs. Custom State#

It’s important to distinguish between compiler-specific state and common state managed by TVM FFI. TVM FFI handles common states like streams and memory allocators through environment functions (e.g., TVMFFIEnvGetStream), allowing kernels to access these without managing their own. However, for any unique state required by your compiler, the global function registration approach is the most suitable method.

See also

For creating custom runtime modules that wrap platform-specific driver APIs (e.g., cuModuleLoad for PTX), see Custom Modules in Function and Module.