Class CubinKernel

Class CubinKernel#

Defined in File cubin_launcher.h

Class Documentation#

class CubinKernel#

CUDA kernel handle for launching kernels.

This class represents a loaded CUDA kernel function and provides methods to launch it with specified grid/block dimensions, arguments, and stream configuration. Obtained from CubinModule by kernel name.

See also

dim3 for grid/block dimension specification

Usage Pattern

// Get kernel from module
tvm::ffi::CubinKernel kernel = module["kernel_name"];

// Prepare arguments (must be pointers to actual values)
void* data_ptr = tensor.data_ptr();
int64_t size = tensor.size(0);
void* args[] = {&data_ptr, &size};

// Configure launch dimensions
tvm::ffi::dim3 grid(32);    // 32 blocks
tvm::ffi::dim3 block(256);  // 256 threads per block

// Launch on stream
cudaStream_t stream = ...;
cudaError_t result = kernel.Launch(args, grid, block, stream);
TVM_FFI_CHECK_CUBIN_LAUNCHER_CUDA_ERROR(result);

Note

This class is movable but not copyable.

Public Functions

inline CubinKernel(cuda_api::LibraryHandle library, const char *name)#

Construct a CubinKernel from a library and kernel name.

Parameters:

library – The cudaLibrary_t handle.
name – Name of the kernel function.

~CubinKernel() = default#: Destructor (kernel handle doesn’t need explicit cleanup)

inline cuda_api::ResultType Launch(void **args, dim3 grid, dim3 block, cuda_api::StreamHandle stream, uint32_t dyn_smem_bytes = 0)#

Launch the kernel with specified parameters.

This function launches the kernel on the current CUDA context/device using the CUDA Runtime API. The kernel executes asynchronously on the specified stream.

Argument Preparation

The args array must contain pointers to the actual argument values, not the values themselves. For example:

void* data_ptr = tensor.data_ptr();
int64_t size = 100;
void* args[] = {&data_ptr, &size};  // Note: addresses of the variables

Launch Configuration

Grid and block dimensions determine the kernel’s parallelism:

Grid: Number of thread blocks (can be 1D, 2D, or 3D)
Block: Number of threads per block (can be 1D, 2D, or 3D)
Total threads = grid.x * grid.y * grid.z * block.x * block.y * block.z

Error Checking

Always check the returned cudaError_t:

TVM_FFI_CHECK_CUBIN_LAUNCHER_CUDA_ERROR(kernel.Launch(args, grid, block, stream));

Note

The kernel executes asynchronously. Use cudaStreamSynchronize() or cudaDeviceSynchronize() to wait for completion if needed.

Parameters:

args – Array of pointers to kernel arguments (must point to actual values).
grid – Grid dimensions (number of blocks in x, y, z).
block – Block dimensions (threads per block in x, y, z).
stream – CUDA stream to launch the kernel on (use 0 for default stream).
dyn_smem_bytes – Dynamic shared memory size in bytes (default: 0).

Returns:

cudaError_t error code from cudaLaunchKernel (cudaSuccess on success).

inline cuda_api::ResultType LaunchEx(void **args, const cuda_api::LaunchConfig &config)#

Launch the kernel using extended launch API with a pre-built config.

This enables features like cluster dimensions (SM90+) that require cuLaunchKernelEx / cudaLaunchKernelExC.

Parameters:

args – Array of pointers to kernel arguments.
config – The launch configuration (populated by ConstructLaunchConfig).

Returns:

Result code.

inline cuda_api::KernelHandle GetHandle() const#: Get the underlying cudaKernel_t handle.

CubinKernel(const CubinKernel&) = delete#

inline CubinKernel(CubinKernel &&other) noexcept#

Move constructor for CubinKernel.

Transfers ownership of the CUDA kernel handle from another CubinKernel instance.

Parameters:: other – The source CubinKernel to move from (will be left in an empty state).

Class CubinKernel

Contents

Class CubinKernel#

Class Documentation#