Stable C ABI#
Note
All code used in this guide lives under examples/stable_c_abi.
Prerequisite
Python: 3.9 or newer (for the
tvm_ffi.config/tvm-ffi-confighelpers)Compiler: C11-capable toolchain (GCC/Clang/MSVC)
TVM-FFI installed via
pip install --reinstall --upgrade apache-tvm-ffi
This guide introduces TVM-FFI’s stable C ABI: a single, minimal and stable ABI that represents any cross-language calls, with DSL and ML compiler codegen in mind.
TVM-FFI builds on the following key idea:
Key Idea: A Single C ABI for all Functions
Every function call can be represented by a single stable C ABI:
int tvm_ffi_c_abi( // returns 0 on success; non-zero on failure
void* handle, // library handle
const TVMFFIAny* args, // inputs: args[0 ... N - 1]
int N, // number of inputs
TVMFFIAny* result, // output: *result
);
where TVMFFIAny, is a tagged union of all supported types, e.g. integers, floats, Tensors, strings, etc., and can be further extended to arbitrary user-defined types.
Built on top of this stable C ABI, TVM-FFI defines a common C ABI protocol for all functions, and further provides an extensible, performant, and ecosystem-friendly open solution for all.
The rest of this guide covers:
The stable C layout and calling convention of
tvm_ffi_c_abi;C examples from both callee and caller side of this ABI.
Stable C Layout#
TVM-FFI’s C ABI uses a stable layout for all the input and output arguments.
Layout of TVMFFIAny#
TVMFFIAny is a fixed-size (128-bit) tagged union that represents all supported types.
First 32 bits: type index indicating which value is stored (supports up to 2^32 types).
Next 32 bits: reserved (used for flags in rare cases, e.g. small-string optimization).
Last 64 bits: payload that is either a 64-bit integer, a 64-bit floating-point number, or a pointer to a heap-allocated object.
The following conventions apply when representing values in TVMFFIAny:
Primitive types: the last 64 bits directly store the value, for example:
Integers
Floating-point numbers
Heap-allocated objects: the last 64 bits store a pointer to the actual object, for example:
Arbitrary objects: the type index identifies the concrete type, and the last 64 bits store a pointer to a reference-counted object in TVM-FFI’s object format, for example:
tvm_ffi.Function, representing all functions, such as Python/C++ functions/lambdas, etc.;tvm_ffi.Arrayandtvm_ffi.Map(list/dict containers ofTVMFFIAnyvalues);Extending to up to 2^32 types is supported.
Function Calling Convention#
Function calls in TVM-FFI share the same calling convention, tvm_ffi_c_abi, as described above.
handle: void*: optional library/closure handle passed to the callee. For exported symbols this is typicallyNULL; closures may use it to capture context.args: TVMFFIAny*: pointer to a contiguous array of input arguments.num_args: int: number of input arguments.result: TVMFFIAny*: out-parameter that receives the function result (usekTVMFFINonefor “no return value”).
Figure 2. Layout and calling convention of tvm_ffi_c_abi, where Any in this figure refers to TVMFFIAny.#
Stability and Interoperability#
Stability. The pure C layout and the calling convention are stable across compiler versions and independent of host languages or frameworks.
Cross-language. TVM-FFI implements this calling convention in multiple languages (C, C++, Python, Rust, …), enabling code written in one language—or generated by a DSL targeting the ABI—to be called from another language.
Cross-framework. TVM-FFI uses standard data structures such as DLPack tensors to represent arrays, so compiled functions can be used from any array framework that implements the DLPack protocol (NumPy, PyTorch, TensorFlow, CuPy, JAX, and others).
Stable ABI in C Code#
Hint
You can build and run the examples either with raw compiler commands or with CMake. Both approaches are demonstrated below.
TVM FFI’s C ABI is designed with DSL and ML compilers in mind. DSL codegen usually relies on MLIR, LLVM or low-level C as the compilation target, where no access to C++ features is available, and where stable C ABIs are preferred for simplicity and stability.
This section shows how to write C code that follows the stable C ABI. Specifically, we provide two examples:
Callee side: A CPU
add_one_cpukernel in C that is equivalent to the C++ example.Caller side: A loader and runner in C that invokes the kernel, a direct C translation of the C++ example.
The C code is minimal and dependency-free, so it can serve as a direct reference for DSL compilers that want to expose or invoke kernels through the ABI.
Callee: add_one_cpu Kernel#
Below is a minimal add_one_cpu kernel in C that follows the stable C ABI. It has three steps:
Step 1. Extract input
xand outputyas DLPack tensors;Step 2. Implement the kernel
y = x + 1on CPU with a simple for-loop;Step 3. Set the output result to
result.
// File: src/add_one_cpu.cc
TVM_FFI_DLL int __tvm_ffi_add_one_cpu(void* handle, const TVMFFIAny* args, int32_t num_args,
TVMFFIAny* result) {
// Step 1. Extract inputs from `Any`
// Step 1.1. Extract `x := args[0]`
DLTensor* x;
if (args[0].type_index == kTVMFFIDLTensorPtr) x = (DLTensor*)(args[0].v_ptr);
else if (args[0].type_index == kTVMFFITensor) x = (DLTensor*)(args[0].v_c_str + sizeof(TVMFFIObject));
else { TVMFFIErrorSetRaisedFromCStr("ValueError", "Expects a Tensor input"); return -1; }
// Step 1.2. Extract `y := args[1]`
DLTensor* y;
if (args[1].type_index == kTVMFFIDLTensorPtr) y = (DLTensor*)(args[1].v_ptr);
else if (args[1].type_index == kTVMFFITensor) y = (DLTensor*)(args[1].v_c_str + sizeof(TVMFFIObject));
else { TVMFFIErrorSetRaisedFromCStr("ValueError", "Expects a Tensor output"); return -1; }
// Step 2. Perform add one: y = x + 1
for (int64_t i = 0; i < x->shape[0]; ++i) {
((float*)y->data)[i] = ((float*)x->data)[i] + 1.0f;
}
// Step 3. Return error code 0 (success)
//
// Note that `result` is not set, as the output is passed in via `y` argument,
// which is functionally similar to a Python function with signature:
//
// def add_one(x: Tensor, y: Tensor) -> None: ...
return 0;
}
Build it with either approach:
gcc -shared -O3 -std=c11 src/add_one_cpu.c \
-fPIC -fvisibility=hidden \
$(tvm-ffi-config --cflags) \
$(tvm-ffi-config --ldflags) \
$(tvm-ffi-config --libs) \
-o $BUILD_DIR/add_one_cpu.so
cmake . -B build -DEXAMPLE_NAME="kernel" -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build --config RelWithDebInfo
Compiler codegen. This C code serves as a direct reference for DSL compilers. To emit a function that follows the stable C ABI, ensure the following:
Symbol naming: define the exported symbol name as
__tvm_ffi_{func_name};Type checking: check input types via
TVMFFIAny::type_index, then marshal inputs fromTVMFFIAnyto the desired types;Error handling: return 0 on success, or a non-zero code on failure. When an error occurs, set an error message via
TVMFFIErrorSetRaisedFromCStr()orTVMFFIErrorSetRaisedFromCStrParts().
C vs. C++. Compared to the C++ example, there are a few key differences:
The explicit marshalling in Step 1 is only needed in C. In C++, templates hide these details.
The C++ macro
TVM_FFI_DLL_EXPORT_TYPED_FUNC(used to exportadd_one_cpu) is not needed in C, because this example directly defines the exported C symbol__tvm_ffi_add_one_cpu.
Hint
In TVM-FFI’s C++ APIs, many invocables (functions, lambdas, functors) are automatically converted into the universal C ABI form by tvm::ffi::Function and tvm::ffi::TypedFunction.
Rule of thumb: if an invocable’s arguments and result can be converted to/from tvm::ffi::Any (the C++ equivalent of TVMFFIAny), it can be wrapped as a universal C ABI function.
Caller: Kernel Loader#
Next, a minimal C loader invokes the add_one_cpu kernel. It is functionally identical to the C++ example and performs:
Step 1. Load the shared library
build/add_one_cpu.sothat contains the kernel;Step 2. Get function
add_one_cpufrom the library;Step 3. Invoke the function with two DLTensor inputs
xandy;
// File: src/load.c
#include <stdio.h>
#include <tvm/ffi/c_api.h>
#include <tvm/ffi/extra/c_env_api.h>
// Global functions are looked up during `Initialize` and deallocated during `Finalize`
// - global function: "ffi.Module.load_from_file.so"
static TVMFFIObjectHandle fn_load_module = NULL;
// - global function: "ffi.ModuleGetFunction"
static TVMFFIObjectHandle fn_get_function = NULL;
int Run(DLTensor* x, DLTensor* y) {
int ret_code = 0;
TVMFFIAny call_args[3] = {};
TVMFFIAny mod = {.type_index = kTVMFFINone, .v_obj = NULL};
TVMFFIAny func = {.type_index = kTVMFFINone, .v_obj = NULL};
TVMFFIAny none = {.type_index = kTVMFFINone}; // ignore the return value
// Step 1. Load module
// Equivalent to:
// mod = tvm::ffi::Module::LoadFromFile("build/add_one_cpu.so")
call_args[0] = (TVMFFIAny){.type_index = kTVMFFIRawStr, .v_c_str = "build/add_one_cpu.so"};
call_args[1] = (TVMFFIAny){.type_index = kTVMFFISmallStr, .v_int64 = 0};
if ((ret_code = TVMFFIFunctionCall(fn_load_module, call_args, 2, &mod))) goto _RAII;
// Step 2. Get function `add_one_cpu` from module
// Equivalent to:
// func = mod->GetFunction("add_one_cpu", /*query_imports=*/false).value()
call_args[0] = (TVMFFIAny){.type_index = mod.type_index, .v_obj = mod.v_obj};
call_args[1] = (TVMFFIAny){.type_index = kTVMFFIRawStr, .v_c_str = "add_one_cpu"};
call_args[2] = (TVMFFIAny){.type_index = kTVMFFIBool, .v_int64 = 0};
if ((ret_code = TVMFFIFunctionCall(fn_get_function, call_args, 3, &func))) goto _RAII;
// Step 3. Call function `add_one_cpu(x, y)`
// Equivalent to:
// func(x, y)
call_args[0] = (TVMFFIAny){.type_index = kTVMFFIDLTensorPtr, .v_ptr = x};
call_args[1] = (TVMFFIAny){.type_index = kTVMFFIDLTensorPtr, .v_ptr = y};
if ((ret_code = TVMFFIFunctionCall(func.v_ptr, call_args, 2, &none))) goto _RAII;
_RAII:
if (mod.type_index >= kTVMFFIObject) TVMFFIObjectDecRef(mod.v_obj);
if (func.type_index >= kTVMFFIObject) TVMFFIObjectDecRef(func.v_obj);
if (none.type_index >= kTVMFFIObject) TVMFFIObjectDecRef(none.v_obj);
return ret_code;
}
Auxiliary Logics
static inline int Initialize() {
int ret_code = 0;
TVMFFIByteArray name_load_module = {.data = "ffi.Module.load_from_file.so", .size = 28};
TVMFFIByteArray name_get_function = {.data = "ffi.ModuleGetFunction", .size = 21};
if ((ret_code = TVMFFIFunctionGetGlobal(&name_load_module, &fn_load_module))) return ret_code;
if ((ret_code = TVMFFIFunctionGetGlobal(&name_get_function, &fn_get_function))) return ret_code;
return 0;
}
static inline void Finalize(int ret_code) {
TVMFFIObjectHandle err = NULL;
TVMFFIErrorCell* cell = NULL;
if (fn_load_module) TVMFFIObjectDecRef(fn_load_module);
if (fn_get_function) TVMFFIObjectDecRef(fn_get_function);
if (ret_code) {
TVMFFIErrorMoveFromRaised(&err);
cell = (TVMFFIErrorCell*)((char*)(err) + sizeof(TVMFFIObject));
printf("%.*s: %.*s\n", (int)(cell->kind.size), cell->kind.data, (int)(cell->message.size),
cell->message.data);
}
}
int main() {
int ret_code = 0;
float x_data[5] = {1.0, 2.0, 3.0, 4.0, 5.0};
float y_data[5] = {0.0, 0.0, 0.0, 0.0, 0.0};
int64_t shape[1] = {5};
int64_t strides[1] = {1};
DLDataType f32 = {.code = kTVMFFIFloat, .bits = 32, .lanes = 1};
DLDevice cpu = {.device_type = kDLCPU, .device_id = 0};
DLTensor x = {//
.data = x_data, .device = cpu, .ndim = 1, .dtype = f32,
.shape = shape, .strides = strides, .byte_offset = 0};
DLTensor y = {//
.data = y_data, .device = cpu, .ndim = 1, .dtype = f32,
.shape = shape, .strides = strides, .byte_offset = 0};
if ((ret_code = Initialize())) goto _RAII;
if ((ret_code = Run(&x, &y))) goto _RAII;
printf("[ ");
for (int i = 0; i < 5; ++i) printf("%f ", y_data[i]);
printf("]\n");
_RAII:
Finalize(ret_code);
return ret_code;
}
Build and run the loader with either approach:
gcc -fvisibility=hidden -O3 -std=c11 \
src/load.c \
$(tvm-ffi-config --cflags) \
$(tvm-ffi-config --ldflags) \
$(tvm-ffi-config --libs) \
-Wl,-rpath,$(tvm-ffi-config --libdir) \
-o build/load
build/load
cmake . -B build -DEXAMPLE_NAME="load" -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build --config RelWithDebInfo
build/load
To call a function via the stable C ABI in C, idiomatically:
Convert input arguments to the
TVMFFIAnytype;Call the target function (e.g.,
add_one_cpu) viaTVMFFIFunctionCall();Optionally convert the output
TVMFFIAnyback to the desired type, if the function returns a value.
What’s Next#
ABI specification. See the complete ABI specification in ABI Overview.
Convenient compiler target. The stable C ABI is a simple, portable codegen target for DSL compilers. Emit C that follows this ABI to integrate with TVM-FFI and call the result from multiple languages and frameworks. See Compiler Integration.
Rich and extensible type system. TVM-FFI supports a rich set of types in the stable C ABI: primitive types (integers, floats), DLPack tensors, strings, built-in reference-counted objects (functions, arrays, maps), and user-defined reference-counted objects. See C++ Guide.