Stable C ABI

Stable C ABI#

Note

All code used in this guide is under examples/stable_c_abi.

Prerequisite

Python: 3.9 or newer (for the tvm_ffi.config/tvm-ffi-config helpers)
Compiler: C11-capable toolchain (GCC/Clang/MSVC)

TVM-FFI installed via

pip install --reinstall --upgrade apache-tvm-ffi

This guide introduces TVM-FFI’s stable C ABI: a single, minimal ABI that represents cross-language calls and is designed for DSL and ML compiler codegen.

TVM-FFI is built around the following key idea:

Key Idea: A Single C ABI for all Functions

Every function call can be represented by a single stable C ABI:

int tvm_ffi_c_abi(          // returns 0 on success; non-zero on failure
  void*            handle,  // library handle
  const TVMFFIAny* args,    // inputs: args[0 ... N - 1]
  int              N,       // number of inputs
  TVMFFIAny*       result,  // output: *result
);

where TVMFFIAny is a tagged union of all supported types, e.g. integers, floats, tensors, strings, and more, and can be extended to user-defined types.

Built on top of this stable C ABI, TVM-FFI defines a common C ABI protocol for all functions and provides an extensible, performant, and ecosystem-friendly solution.

The rest of this guide covers:

The stable C layout and calling convention of tvm_ffi_c_abi;
C examples from both the callee and caller side of this ABI.

Stable C Layout#

TVM-FFI’s C ABI uses a stable layout for all input and output arguments.

Layout of `TVMFFIAny`#

TVMFFIAny is a fixed-size (128-bit) tagged union that represents all supported types.

First 32 bits: type index indicating which value is stored (supports up to 2^32 types).
Next 32 bits: reserved (used for flags in rare cases, e.g., small-string optimization).
Last 64 bits: payload that is either a 64-bit integer, a 64-bit floating-point number, or a pointer to a heap-allocated object.

Layout of the 128-bit Any tagged union — Figure 1. Layout spec for the `TVMFFIAny` tagged union.#

The following conventions apply when representing values in TVMFFIAny:

Primitive types: the last 64 bits directly store the value, for example:
- Integers
- Floating-point numbers
Heap-allocated objects: the last 64 bits store a pointer to the actual object, for example:
- Managed tensor objects that follow DLPack (i.e. DLTensor) layout.
Arbitrary objects: the type index identifies the concrete type, and the last 64 bits store a pointer to a reference-counted object in TVM-FFI’s object format, for example:
- tvm_ffi.Function, representing all functions, such as Python/C++ functions/lambdas, etc.;
- tvm_ffi.Array and tvm_ffi.Map (list/dict containers of TVMFFIAny values);
- Extending to up to 2^32 types is supported.

Function Calling Convention#

Function calls in TVM-FFI share the same calling convention, tvm_ffi_c_abi, as described above.

handle: void*: optional library/closure handle passed to the callee. For exported symbols this is typically NULL; closures may use it to capture context.
args: TVMFFIAny*: pointer to a contiguous array of input arguments.
num_args: int: number of input arguments.
result: TVMFFIAny*: out-parameter that receives the function result (use kTVMFFINone for “no return value”).

Layout and calling convention for tvm_ffi_c_abi — Figure 2. Layout and calling convention of tvm_ffi_c_abi, where `Any` in this figure refers to `TVMFFIAny`.#

Stability and Interoperability#

Stability. The pure C layout and the calling convention are stable across compiler versions and independent of host languages or frameworks.

Cross-language. TVM-FFI implements this calling convention in multiple languages (C, C++, Python, Rust, …), enabling code written in one language—or generated by a DSL targeting the ABI—to be called from another language.

Cross-framework. TVM-FFI uses standard data structures such as DLPack tensors to represent arrays, so compiled functions can be used from any array framework that implements the DLPack protocol (NumPy, PyTorch, TensorFlow, CuPy, JAX, and others).

Stable ABI in C Code#

Hint

You can build and run the examples either with raw compiler commands or with CMake. Both approaches are demonstrated below.

TVM-FFI’s C ABI is designed with DSL and ML compilers in mind. DSL codegen often targets MLIR, LLVM, or low-level C, where C++ features are unavailable and stable C ABIs are preferred for simplicity and stability.

This section shows how to write C code that follows the stable C ABI using two examples:

Callee side: A CPU add_one_cpu kernel in C that is equivalent to the C++ example.
Caller side: A loader and runner in C that invokes the kernel, a direct C translation of the C++ example.

The C code is minimal and dependency-free, so it can serve as a direct reference for DSL compilers that want to expose or invoke kernels through the ABI.

Callee: `add_one_cpu` Kernel#

Below is a minimal add_one_cpu kernel in C that follows the stable C ABI in three steps:

Step 1. Extract input x and output y as DLPack tensors;
Step 2. Implement the kernel y = x + 1 on CPU with a simple for-loop;
Step 3. Set the output result in result.

// File: src/add_one_cpu.cc
TVM_FFI_DLL_EXPORT int __tvm_ffi_add_one_cpu(void* handle, const TVMFFIAny* args,
                                             int32_t num_args, TVMFFIAny* result) {
  // Step 1. Extract inputs from `Any`
  // Step 1.1. Extract `x := args[0]`
  DLTensor* x;
  if (args[0].type_index == kTVMFFIDLTensorPtr) x = (DLTensor*)(args[0].v_ptr);
  else if (args[0].type_index == kTVMFFITensor) x = (DLTensor*)(args[0].v_c_str + sizeof(TVMFFIObject));
  else { TVMFFIErrorSetRaisedFromCStr("ValueError", "Expects a Tensor input"); return -1; }
  // Step 1.2. Extract `y := args[1]`
  DLTensor* y;
  if (args[1].type_index == kTVMFFIDLTensorPtr) y = (DLTensor*)(args[1].v_ptr);
  else if (args[1].type_index == kTVMFFITensor) y = (DLTensor*)(args[1].v_c_str + sizeof(TVMFFIObject));
  else { TVMFFIErrorSetRaisedFromCStr("ValueError", "Expects a Tensor output"); return -1; }

  // Step 2. Perform add one: y = x + 1
  for (int64_t i = 0; i < x->shape[0]; ++i) {
    ((float*)y->data)[i] = ((float*)x->data)[i] + 1.0f;
  }

  // Step 3. Return error code 0 (success)
  //
  // Note that `result` is not set, as the output is passed in via `y` argument,
  // which is functionally similar to a Python function with signature:
  //
  //   def add_one(x: Tensor, y: Tensor) -> None: ...
  return 0;
}

Build it with either approach:

gcc -shared -O3 -std=c11 src/add_one_cpu.c  \
    -fPIC -fvisibility=hidden               \
    $(tvm-ffi-config --cflags)              \
    $(tvm-ffi-config --ldflags)             \
    $(tvm-ffi-config --libs)                \
    -o $BUILD_DIR/add_one_cpu.so

cmake . -B build -DEXAMPLE_NAME="kernel" -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build --config RelWithDebInfo

Compiler codegen. This C code serves as a direct reference for DSL compilers. To emit a function that follows the stable C ABI, ensure the following:

Symbol naming: define the exported symbol name as __tvm_ffi_{func_name};
Type checking: check input types via TVMFFIAny::type_index, then marshal inputs from TVMFFIAny to the desired types;
Error handling: return 0 on success, or a non-zero code on failure. When an error occurs, set an error message via TVMFFIErrorSetRaisedFromCStr() or TVMFFIErrorSetRaisedFromCStrParts().

C vs. C++. Compared to the C++ example, there are a few key differences:

The explicit marshalling in Step 1 is only needed in C. In C++, templates hide these details.
The C++ macro TVM_FFI_DLL_EXPORT_TYPED_FUNC (used to export add_one_cpu) is not needed in C, since this example directly defines the exported C symbol __tvm_ffi_add_one_cpu.

Hint

In TVM-FFI’s C++ APIs, many invocables (functions, lambdas, functors) are automatically converted into the universal C ABI form by tvm::ffi::Function and tvm::ffi::TypedFunction.

Rule of thumb: if an invocable’s arguments and result can be converted to/from tvm::ffi::Any (the C++ equivalent of TVMFFIAny), it can be wrapped as a universal C ABI function.

Caller: Kernel Loader#

Next, a minimal C loader invokes the add_one_cpu kernel. It mirrors the C++ example and performs:

Step 1. Load the shared library build/add_one_cpu.so that contains the kernel;
Step 2. Get function add_one_cpu from the library;
Step 3. Invoke the function with two DLTensor inputs x and y;

// File: src/load.c
#include <stdio.h>
#include <tvm/ffi/c_api.h>
#include <tvm/ffi/extra/c_env_api.h>

// Global functions are looked up during `Initialize` and deallocated during `Finalize`
// - global function: "ffi.Module.load_from_file.so"
static TVMFFIObjectHandle fn_load_module = NULL;
// - global function: "ffi.ModuleGetFunction"
static TVMFFIObjectHandle fn_get_function = NULL;

int Run(DLTensor* x, DLTensor* y) {
  int ret_code = 0;
  TVMFFIAny call_args[3] = {};
  TVMFFIAny mod = {.type_index = kTVMFFINone, .v_obj = NULL};
  TVMFFIAny func = {.type_index = kTVMFFINone, .v_obj = NULL};
  TVMFFIAny none = {.type_index = kTVMFFINone};  // ignore the return value

  // Step 1. Load module
  // Equivalent to:
  //    mod = tvm::ffi::Module::LoadFromFile("build/add_one_cpu.so")
  call_args[0] = (TVMFFIAny){.type_index = kTVMFFIRawStr, .v_c_str = "build/add_one_cpu.so"};
  call_args[1] = (TVMFFIAny){.type_index = kTVMFFISmallStr, .v_int64 = 0};
  if ((ret_code = TVMFFIFunctionCall(fn_load_module, call_args, 2, &mod))) goto _RAII;

  // Step 2. Get function `add_one_cpu` from module
  // Equivalent to:
  //    func = mod->GetFunction("add_one_cpu", /*query_imports=*/false).value()
  call_args[0] = (TVMFFIAny){.type_index = mod.type_index, .v_obj = mod.v_obj};
  call_args[1] = (TVMFFIAny){.type_index = kTVMFFIRawStr, .v_c_str = "add_one_cpu"};
  call_args[2] = (TVMFFIAny){.type_index = kTVMFFIBool, .v_int64 = 0};
  if ((ret_code = TVMFFIFunctionCall(fn_get_function, call_args, 3, &func))) goto _RAII;

  // Step 3. Call function `add_one_cpu(x, y)`
  // Equivalent to:
  //    func(x, y)
  call_args[0] = (TVMFFIAny){.type_index = kTVMFFIDLTensorPtr, .v_ptr = x};
  call_args[1] = (TVMFFIAny){.type_index = kTVMFFIDLTensorPtr, .v_ptr = y};
  if ((ret_code = TVMFFIFunctionCall(func.v_ptr, call_args, 2, &none))) goto _RAII;

_RAII:
  if (mod.type_index >= kTVMFFIObject) TVMFFIObjectDecRef(mod.v_obj);
  if (func.type_index >= kTVMFFIObject) TVMFFIObjectDecRef(func.v_obj);
  if (none.type_index >= kTVMFFIObject) TVMFFIObjectDecRef(none.v_obj);
  return ret_code;
}

Build and run the loader with either approach:

gcc -fvisibility=hidden -O3 -std=c11        \
    src/load.c                              \
    $(tvm-ffi-config --cflags)              \
    $(tvm-ffi-config --ldflags)             \
    $(tvm-ffi-config --libs)                \
    -Wl,-rpath,$(tvm-ffi-config --libdir)   \
    -o build/load
build/load

cmake . -B build -DEXAMPLE_NAME="load" -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build --config RelWithDebInfo
build/load

In C, the idiomatic steps to call a function via the stable C ABI are:

Convert input arguments to the TVMFFIAny type;
Call the target function (e.g., add_one_cpu) via TVMFFIFunctionCall();
Optionally convert the output TVMFFIAny back to the desired type, if the function returns a value.

What’s Next#

ABI specification. See the full ABI specification in ABI Overview.

Convenient compiler target. The stable C ABI is a simple, portable codegen target for DSL compilers. Emit C that follows this ABI to integrate with TVM-FFI and call the result from multiple languages and frameworks. See ABI Overview.

Rich and extensible type system. TVM-FFI supports a rich set of types in the stable C ABI: primitive types (integers, floats), DLPack tensors, strings, built-in reference-counted objects (functions, arrays, maps), and user-defined reference-counted objects. See C++ Guide.