tvm
|
TIR builtin intrinsics. More...
Go to the source code of this file.
Namespaces | |
tvm | |
runtime implementation for LibTorch/TorchScript. | |
tvm::tir | |
tvm::tir::builtin | |
Collection of builtin intrinsics as ops. | |
Enumerations | |
enum | tvm::tir::builtin::TVMStructFieldKind : int { tvm::tir::builtin::kArrAddr , tvm::tir::builtin::kArrData , tvm::tir::builtin::kArrShape , tvm::tir::builtin::kArrStrides , tvm::tir::builtin::kArrNDim , tvm::tir::builtin::kArrTypeCode , tvm::tir::builtin::kArrTypeBits , tvm::tir::builtin::kArrTypeLanes , tvm::tir::builtin::kArrByteOffset , tvm::tir::builtin::kArrDeviceId , tvm::tir::builtin::kArrDeviceType , tvm::tir::builtin::kArrKindBound_ , tvm::tir::builtin::kTVMValueContent , tvm::tir::builtin::kTVMValueKindBound_ } |
The kind of structure field info used in intrinsic. More... | |
Functions | |
const Op & | tvm::tir::builtin::ret () |
Return value. More... | |
const Op & | tvm::tir::builtin::reinterpret () |
Reinterpret the value using the target type. More... | |
const Op & | tvm::tir::builtin::likely () |
Marks a condition is likely going to happen. More... | |
const Op & | tvm::tir::builtin::bitwise_and () |
Bitwise and operator. More... | |
const Op & | tvm::tir::builtin::bitwise_or () |
Bitwise or operator. More... | |
const Op & | tvm::tir::builtin::bitwise_xor () |
Bitwise xor operator. More... | |
const Op & | tvm::tir::builtin::bitwise_not () |
Bitwise not operator. More... | |
const Op & | tvm::tir::builtin::shift_left () |
Left shift. More... | |
const Op & | tvm::tir::builtin::shift_right () |
Right shift. More... | |
const Op & | tvm::tir::builtin::large_uint_imm () |
See pesudo code. More... | |
const Op & | tvm::tir::builtin::q_multiply_shift () |
Execute a multiplication between two Q-numbers x and y followed by a right shift s The default rounding rule is to the nearest value, rounding half up (i.e., round(x.1) = x and round (x.5) = x+1) More... | |
const Op & | tvm::tir::builtin::address_of () |
Returns the address of an element in the buffer (see pseudocode below). More... | |
const Op & | tvm::tir::builtin::if_then_else () |
Same as select, used for unsafe memory access. More... | |
const Op & | tvm::tir::builtin::isnullptr () |
See pesudo code. More... | |
const Op & | tvm::tir::builtin::isnan () |
Check if value is nan. More... | |
const Op & | tvm::tir::builtin::popcount () |
Popcount. More... | |
const Op & | tvm::tir::builtin::fma () |
Fused multiply add. More... | |
const Op & | tvm::tir::builtin::call_extern () |
Call an extern C function with given name and signature from the types of args in the runtime environment. More... | |
const Op & | tvm::tir::builtin::call_pure_extern () |
Call an pure extern C function with given name and signature from the types of args in the runtime environment. More... | |
const Op & | tvm::tir::builtin::call_llvm_intrin () |
Call an LLVM intrinsic with a given intrinsic id and signature from the types of args in the runtime environment. More... | |
const Op & | tvm::tir::builtin::call_llvm_pure_intrin () |
Call an LLVM pure intrinsic with a given intrinsic id and signature from the types of args in the runtime environment. More... | |
const Op & | tvm::tir::builtin::call_spirv_pure_glsl450 () |
Call an SPIRV pure GLSL450 intrinsic. More... | |
const Op & | tvm::tir::builtin::prefetch () |
Prefetch a cacheline. More... | |
const Op & | tvm::tir::builtin::tvm_access_ptr () |
Get head access address with memory access pattern info. More... | |
const Op & | tvm::tir::builtin::tvm_static_handle () |
Create a function local static handle that iniitalizes to nullptr. can be used to cache function local static resources. More... | |
const Op & | tvm::tir::builtin::tvm_context_id () |
Return a unique context id, used for hint of workspace separation. Different context id ganrantees not having overlapping workspace. More... | |
const Op & | tvm::tir::builtin::tvm_tuple () |
tvm_tuple is not an actual function and cannot codegen. It is used to represent tuple structure in value field of AttrStmt, for the sake of giving hint to optimization. More... | |
const Op & | tvm::tir::builtin::tvm_struct_get () |
See pesudo code. More... | |
const Op & | tvm::tir::builtin::tvm_struct_set () |
See pesudo code. More... | |
const Op & | tvm::tir::builtin::lookup_param () |
See pseudo code Type lookup_param(String param_name) { return __tvm_param__param_name; }. More... | |
const Op & | tvm::tir::builtin::tvm_throw_last_error () |
See pesudo code. More... | |
const Op & | tvm::tir::builtin::tvm_stack_alloca () |
See pesudo code. More... | |
const Op & | tvm::tir::builtin::tvm_stack_make_shape () |
Allocate a shape tuple on stack, return the handle. More... | |
const Op & | tvm::tir::builtin::tvm_stack_make_array () |
Allocate a NDArray(DLTensor) on stack, return the handle. More... | |
const Op & | tvm::tir::builtin::tvm_call_packed () |
See pesudo code. More... | |
const Op & | tvm::tir::builtin::tvm_call_cpacked () |
See pesudo code. More... | |
const Op & | tvm::tir::builtin::tvm_call_trace_packed () |
See pesudo code. More... | |
const Op & | tvm::tir::builtin::tvm_check_return () |
Checks the return value of another call is correct or returns a given value. More... | |
const Op & | tvm::tir::builtin::tvm_thread_context () |
See pesudo code Mark the content as thread local context, can get optimized by only call the call once at thread start. More... | |
const Op & | tvm::tir::builtin::tvm_thread_invariant () |
Mark a condition to be thread invariant. This means the condition must be the same for all threads. More... | |
const Op & | tvm::tir::builtin::tvm_call_packed_lowered () |
Lowered version of call packed, the space of value and type codes are explicitly allocated. More... | |
const Op & | tvm::tir::builtin::tvm_call_cpacked_lowered () |
Lowered version of call c-packed, the space of value and type codes are explicitly allocated. More... | |
const Op & | tvm::tir::builtin::tvm_call_trace_packed_lowered () |
Lowered version of trace intrinsic, the space of value and type codes are explicitly allocated. The return value is the (end - 1) value on the stack. More... | |
const Op & | tvm::tir::builtin::tvm_storage_sync () |
See pseudo code. More... | |
const Op & | tvm::tir::builtin::tvm_warp_shuffle () |
See pseudo code. More... | |
const Op & | tvm::tir::builtin::tvm_warp_shuffle_up () |
const Op & | tvm::tir::builtin::tvm_warp_shuffle_down () |
const Op & | tvm::tir::builtin::tvm_warp_activemask () |
const Op & | tvm::tir::builtin::tvm_global_barrier_kinit () |
Initialize the global barrier. Call this at beginning of kernel that need global barrier. More... | |
const Op & | tvm::tir::builtin::tvm_thread_allreduce () |
See pesudo code. More... | |
const Op & | tvm::tir::builtin::tvm_load_matrix_sync () |
tvm intrinsic for tensor core load operators. More... | |
const Op & | tvm::tir::builtin::tvm_mma_sync () |
tvm intrinsic for tensor core mma_sync operators. More... | |
const Op & | tvm::tir::builtin::tvm_bmma_sync () |
tvm intrinsic for tensor core bmma_sync operators. More... | |
const Op & | tvm::tir::builtin::tvm_fill_fragment () |
tvm intrinsic for tensor core fill_fragment operators. More... | |
const Op & | tvm::tir::builtin::tvm_store_matrix_sync () |
tvm intrinsic for tensor core store operators. More... | |
const Op & | tvm::tir::builtin::ptx_mma () |
tvm intrinsic for ptx tensor core mma instructions. More... | |
const Op & | tvm::tir::builtin::ptx_ldg32 () |
tvm intrinsic for ptx predicate load with 32-bit data type. More... | |
const Op & | tvm::tir::builtin::ptx_mma_sp () |
tvm intrinsic for sparse tensor core ptx instructions. More... | |
const Op & | tvm::tir::builtin::ptx_ldmatrix () |
tvm intrinsic for ptx load matrix from shared memory. More... | |
const Op & | tvm::tir::builtin::ptx_cp_async () |
tvm intrinsics for ptx async copy from global to shared memory using cp.async More... | |
const Op & | tvm::tir::builtin::ptx_cp_async_bulk () |
tvm intrinsics for ptx async copy from global to shared memory using cp.async.bulk More... | |
const Op & | tvm::tir::builtin::ptx_commit_group () |
tvm intrinsics for ptx async copy commit and wait. More... | |
const Op & | tvm::tir::builtin::ptx_wait_group () |
const Op & | tvm::tir::builtin::ptx_cp_async_barrier () |
tvm intrinsics for ptx async copy barrier using cp.async.mbarrier.arrive More... | |
const Op & | tvm::tir::builtin::ptx_init_barrier_thread_count () |
tvm intrinsics for ptx barrier initialization of thread count using mbarrier.init More... | |
const Op & | tvm::tir::builtin::ptx_arrive_barrier () |
tvm intrinsics for ptx barrier arrival using mbarrier.arrive More... | |
const Op & | tvm::tir::builtin::ptx_arrive_barrier_expect_tx () |
tvm intrinsic for ptx barrier arrival with expect tx using mbarrier.arrive.expect_tx More... | |
const Op & | tvm::tir::builtin::ptx_wait_barrier () |
tvm intrinsics for ptx barrier wait using mbarrier.try_wait More... | |
const Op & | tvm::tir::builtin::create_barriers () |
tvm intrinsics to create N barriers More... | |
const Op & | tvm::tir::builtin::mma_store () |
tvm intrinsic for storing the result of PTX MMA into a destination pointer. For example, if each thread in a warp of size 32 has 4 elements from the result of m16xn8xk16 MMA in its registers, this intrinsic can be used to store the result in a 16x8 region in shared or global memory. More... | |
const Op & | tvm::tir::builtin::mma_fill () |
tvm intrinsic for zero-initializing an MMA accumulation register. For example, if each thread in a warp of size 32 has 8 elements from the A matrix in m16xn8xk16 MMA in its registers, this intrinsic can be used to zero-initialize its 4 accumulation registers. More... | |
const Op & | tvm::tir::builtin::make_filled_simdgroup_matrix () |
tvm intrinsic for initializing and simdgroup with given value. More... | |
const Op & | tvm::tir::builtin::simdgroup_load () |
tvm intrinsic for loading data from device memory or threadgroup memory to simdgroup. More... | |
const Op & | tvm::tir::builtin::simdgroup_store () |
tvm intrinsic for storing data from simdgroup to device memory or threadgroup memory. More... | |
const Op & | tvm::tir::builtin::simdgroup_multiply_accumulate () |
tvm intrinsic for multiply and accumulate two matrices in simdgroup More... | |
const Op & | tvm::tir::builtin::vectorhigh () |
Get the high level half of the vector. More... | |
const Op & | tvm::tir::builtin::vectorlow () |
Get the low-level half of the vector. More... | |
const Op & | tvm::tir::builtin::vectorcombine () |
Concat two vectors. More... | |
const Op & | tvm::tir::builtin::dp4a () |
Dot product of two int8x4 vectors and add an optional accumulator. More... | |
const Op & | tvm::tir::builtin::atomic_add () |
atomic add instruction, corresponding e.g. to atomicAdd in CUDA More... | |
const Op & | tvm::tir::builtin::nd_mem_alloc_with_scope () |
Create an Nd memory allocation with storage scope. More... | |
const Op & | tvm::tir::builtin::texture2d_store () |
Store to texture 2d memory. More... | |
const Op & | tvm::tir::builtin::texture2d_load () |
Load from texture 2d memory. More... | |
const Op & | tvm::tir::builtin::dma_copy () |
Initiate a non-blocking DMA copy from source to destination. More... | |
const Op & | tvm::tir::builtin::dma_wait () |
Wait until the number of DMA groups in flight is less than or equal to some maximum. More... | |
const Op & | tvm::tir::builtin::dma_start_group () |
Start a group of DMA copies. More... | |
const Op & | tvm::tir::builtin::dma_end_group () |
End a group of DMA copies. More... | |
const Op & | tvm::tir::builtin::assume () |
Provide a true statement that can be used for simplifications. More... | |
const Op & | tvm::tir::builtin::undef () |
Returns an initialized but arbitrary value. More... | |
const Op & | tvm::tir::builtin::start_profile_intrinsic () |
Profiling intrinsic. More... | |
const Op & | tvm::tir::builtin::end_profile_intrinsic () |
Profiling intrinsic. More... | |
const Op & | tvm::tir::builtin::anylist_getitem () |
Get a item from any list and return it. More... | |
const Op & | tvm::tir::builtin::anylist_resetitem () |
Reset and clear a item in any list. More... | |
const Op & | tvm::tir::builtin::anylist_setitem_call_packed () |
Set an item into any list by running packed function call. More... | |
const Op & | tvm::tir::builtin::anylist_setitem_call_cpacked () |
Same as anylist_setitem_call_packed but use C calling convention. More... | |
const Op & | tvm::tir::builtin::vscale () |
Get the target's vscale value. It will be lowered to llvm.vscale intrinsic (https://llvm.org/docs/LangRef.html#llvm-vscale-intrinsic) More... | |
const Op & | tvm::tir::builtin::get_active_lane_mask () |
Calculate a predicate mask given an upper bound (limit) and a current value (base). More... | |
TIR builtin intrinsics.
TIR builtin intrinsics are stored as tvm:Op. They are processed in the same way as we process Ops.
It is not necessary to create a function for every Op, as we can obtain them through Op::Get.
This file contains the most commonly used intrinsics or those that have special semantics and need compiler support.