Class CubinKernel#
Defined in File cubin_launcher.h
Class Documentation#
-
class CubinKernel#
CUDA kernel handle for launching kernels.
This class represents a loaded CUDA kernel function and provides methods to launch it with specified grid/block dimensions, arguments, and stream configuration. Obtained from CubinModule by kernel name.
See also
CubinModule for loading CUBIN and getting kernels
See also
dim3 for grid/block dimension specification
- Usage Pattern
// Get kernel from module tvm::ffi::CubinKernel kernel = module["kernel_name"]; // Prepare arguments (must be pointers to actual values) void* data_ptr = tensor.data_ptr(); int64_t size = tensor.size(0); void* args[] = {&data_ptr, &size}; // Configure launch dimensions tvm::ffi::dim3 grid(32); // 32 blocks tvm::ffi::dim3 block(256); // 256 threads per block // Launch on stream cudaStream_t stream = ...; cudaError_t result = kernel.Launch(args, grid, block, stream); TVM_FFI_CHECK_CUDA_ERROR(result);
Note
This class is movable but not copyable.
Public Functions
-
inline CubinKernel(cudaLibrary_t library, const char *name)#
Construct a CubinKernel from a library and kernel name.
- Parameters:
library – The cudaLibrary_t handle.
name – Name of the kernel function.
-
~CubinKernel() = default#
Destructor (kernel handle doesn’t need explicit cleanup)
-
inline cudaError_t Launch(void **args, dim3 grid, dim3 block, cudaStream_t stream, uint32_t dyn_smem_bytes = 0)#
Launch the kernel with specified parameters.
This function launches the kernel on the current CUDA context/device using the CUDA Runtime API. The kernel executes asynchronously on the specified stream.
- Argument Preparation
The
argsarray must contain pointers to the actual argument values, not the values themselves. For example:void* data_ptr = tensor.data_ptr(); int64_t size = 100; void* args[] = {&data_ptr, &size}; // Note: addresses of the variables
- Launch Configuration
Grid and block dimensions determine the kernel’s parallelism:
Grid: Number of thread blocks (can be 1D, 2D, or 3D)
Block: Number of threads per block (can be 1D, 2D, or 3D)
Total threads = grid.x * grid.y * grid.z * block.x * block.y * block.z
- Error Checking
Always check the returned cudaError_t:
cudaError_t result = kernel.Launch(args, grid, block, stream); TVM_FFI_CHECK_CUDA_ERROR(result);
Note
The kernel executes asynchronously. Use cudaStreamSynchronize() or cudaDeviceSynchronize() to wait for completion if needed.
- Parameters:
args – Array of pointers to kernel arguments (must point to actual values).
grid – Grid dimensions (number of blocks in x, y, z).
block – Block dimensions (threads per block in x, y, z).
stream – CUDA stream to launch the kernel on (use 0 for default stream).
dyn_smem_bytes – Dynamic shared memory size in bytes (default: 0).
- Returns:
cudaError_t error code from cudaLaunchKernel (cudaSuccess on success).
-
inline cudaKernel_t GetHandle() const#
Get the underlying cudaKernel_t handle.
-
CubinKernel(const CubinKernel&) = delete#
-
inline CubinKernel(CubinKernel &&other) noexcept#
Move constructor for CubinKernel.
Transfers ownership of the CUDA kernel handle from another CubinKernel instance.
- Parameters:
other – The source CubinKernel to move from (will be left in an empty state).