|
tvm
|
TIR builtin intrinsics. More...
Go to the source code of this file.
Namespaces | |
| tvm | |
| Performance counters for profiling via the PAPI library. | |
| tvm::tir | |
| tvm::tir::builtin | |
| Collection of builtin intrinsics as ops. | |
Functions | |
| const Op & | tvm::tir::builtin::ret () |
| Return value. More... | |
| const Op & | tvm::tir::builtin::thread_return () |
| Return from a GPU thread. More... | |
| const Op & | tvm::tir::builtin::continue_loop () |
| Loop continue. More... | |
| const Op & | tvm::tir::builtin::break_loop () |
| Loop break. More... | |
| const Op & | tvm::tir::builtin::reinterpret () |
| Reinterpret the value using the target type. More... | |
| const Op & | tvm::tir::builtin::likely () |
| Marks a condition is likely going to happen. More... | |
| const Op & | tvm::tir::builtin::bitwise_and () |
| Bitwise and operator. More... | |
| const Op & | tvm::tir::builtin::bitwise_or () |
| Bitwise or operator. More... | |
| const Op & | tvm::tir::builtin::bitwise_xor () |
| Bitwise xor operator. More... | |
| const Op & | tvm::tir::builtin::bitwise_not () |
| Bitwise not operator. More... | |
| const Op & | tvm::tir::builtin::shift_left () |
| Left shift. More... | |
| const Op & | tvm::tir::builtin::shift_right () |
| Right shift. More... | |
| const Op & | tvm::tir::builtin::large_uint_imm () |
| See pesudo code. More... | |
| const Op & | tvm::tir::builtin::q_multiply_shift () |
| Execute a multiplication between two Q-numbers x and y followed by a right shift s The default rounding rule is to the nearest value, rounding half up (i.e., round(x.1) = x and round (x.5) = x+1) More... | |
| const Op & | tvm::tir::builtin::address_of () |
| Returns the address of an element in the buffer (see pseudocode below). More... | |
| const Op & | tvm::tir::builtin::if_then_else () |
| Same as select, used for unsafe memory access. More... | |
| const Op & | tvm::tir::builtin::isnullptr () |
| See pesudo code. More... | |
| const Op & | tvm::tir::builtin::isnan () |
| Check if value is nan. More... | |
| const Op & | tvm::tir::builtin::popcount () |
| Popcount. More... | |
| const Op & | tvm::tir::builtin::fma () |
| Fused multiply add. More... | |
| const Op & | tvm::tir::builtin::call_extern () |
| Call an extern C function with given name and signature from the types of args in the runtime environment. More... | |
| const Op & | tvm::tir::builtin::call_pure_extern () |
| Call an pure extern C function with given name and signature from the types of args in the runtime environment. More... | |
| const Op & | tvm::tir::builtin::call_llvm_intrin () |
| Call an LLVM intrinsic with a given intrinsic id and signature from the types of args in the runtime environment. More... | |
| const Op & | tvm::tir::builtin::call_llvm_pure_intrin () |
| Call an LLVM pure intrinsic with a given intrinsic id and signature from the types of args in the runtime environment. More... | |
| const Op & | tvm::tir::builtin::call_spirv_pure_glsl450 () |
| Call an SPIRV pure GLSL450 intrinsic. More... | |
| const Op & | tvm::tir::builtin::prefetch () |
| same signature as llvm.prefetch More... | |
| const Op & | tvm::tir::builtin::tvm_access_ptr () |
| Get head access address with memory access pattern info. More... | |
| const Op & | tvm::tir::builtin::tvm_static_handle () |
| Create a function local static handle that iniitalizes to nullptr. can be used to cache function local static resources. More... | |
| const Op & | tvm::tir::builtin::tvm_context_id () |
| Return a unique context id, used for hint of workspace separation. Different context id ganrantees not having overlapping workspace. More... | |
| const Op & | tvm::tir::builtin::tvm_tuple () |
| tvm_tuple is not an actual function and cannot codegen. It is used to represent tuple structure in value field of AttrStmt, for the sake of giving hint to optimization. More... | |
| const Op & | tvm::tir::builtin::handle_add_byte_offset () |
| See pesudo code. More... | |
| const Op & | tvm::tir::builtin::tvm_struct_get () |
| See pesudo code. More... | |
| const Op & | tvm::tir::builtin::tvm_struct_set () |
| See pesudo code. More... | |
| const Op & | tvm::tir::builtin::lookup_param () |
| See pseudo code Type lookup_param(ffi::String param_name) { return __tvm_param__param_name; }. More... | |
| const Op & | tvm::tir::builtin::tvm_throw_last_error () |
| See pesudo code. More... | |
| const Op & | tvm::tir::builtin::tvm_stack_alloca () |
| See pesudo code. More... | |
| const Op & | tvm::tir::builtin::tvm_stack_make_shape () |
| Allocate a shape tuple on stack, return the handle. More... | |
| const Op & | tvm::tir::builtin::tvm_stack_make_array () |
| Allocate a Tensor(DLTensor) on stack, return the handle. More... | |
| const Op & | tvm::tir::builtin::tvm_call_packed () |
| See pesudo code. More... | |
| const Op & | tvm::tir::builtin::tvm_call_cpacked () |
| See pesudo code. More... | |
| const Op & | tvm::tir::builtin::tvm_call_trace_packed () |
| See pesudo code. More... | |
| const Op & | tvm::tir::builtin::tvm_thread_context () |
| See pesudo code Mark the content as thread local context, can get optimized by only call the call once at thread start. More... | |
| const Op & | tvm::tir::builtin::tvm_thread_invariant () |
| Mark a condition to be thread invariant. This means the condition must be the same for all threads. More... | |
| const Op & | tvm::tir::builtin::tvm_call_packed_lowered () |
| Lowered version of call packed, the space of value and type codes are explicitly allocated. More... | |
| const Op & | tvm::tir::builtin::tvm_call_cpacked_lowered () |
| Lowered version of call c-packed, the space of value and type codes are explicitly allocated. More... | |
| const Op & | tvm::tir::builtin::tvm_call_trace_packed_lowered () |
| Lowered version of trace intrinsic, the space of value and type codes are explicitly allocated. The return value is the (end - 1) value on the stack. More... | |
| const Op & | tvm::tir::builtin::tvm_storage_sync () |
| See pseudo code. More... | |
| const Op & | tvm::tir::builtin::tvm_warp_shuffle () |
| See pseudo code. More... | |
| const Op & | tvm::tir::builtin::tvm_warp_shuffle_up () |
| const Op & | tvm::tir::builtin::tvm_warp_shuffle_down () |
| const Op & | tvm::tir::builtin::tvm_warp_activemask () |
| const Op & | tvm::tir::builtin::tvm_global_barrier_kinit () |
| Initialize the global barrier. Call this at beginning of kernel that need global barrier. More... | |
| const Op & | tvm::tir::builtin::tvm_thread_allreduce () |
| See pesudo code. More... | |
| const Op & | tvm::tir::builtin::tvm_load_matrix_sync () |
| tvm intrinsic for tensor core load operators. More... | |
| const Op & | tvm::tir::builtin::tvm_mma_sync () |
| tvm intrinsic for tensor core mma_sync operators. More... | |
| const Op & | tvm::tir::builtin::tvm_bmma_sync () |
| tvm intrinsic for tensor core bmma_sync operators. More... | |
| const Op & | tvm::tir::builtin::tvm_fill_fragment () |
| tvm intrinsic for tensor core fill_fragment operators. More... | |
| const Op & | tvm::tir::builtin::tvm_store_matrix_sync () |
| tvm intrinsic for tensor core store operators. More... | |
| const Op & | tvm::tir::builtin::ptx_mma () |
| tvm intrinsic for ptx tensor core mma instructions. More... | |
| const Op & | tvm::tir::builtin::ptx_ldg32 () |
| tvm intrinsic for ptx predicate load with 32-bit data type. More... | |
| const Op & | tvm::tir::builtin::ptx_mma_sp () |
| tvm intrinsic for sparse tensor core ptx instructions. More... | |
| const Op & | tvm::tir::builtin::ptx_ldmatrix () |
| tvm intrinsic for ptx load matrix from shared memory. More... | |
| const Op & | tvm::tir::builtin::ptx_cp_async () |
| tvm intrinsics for ptx async copy from global to shared memory using cp.async More... | |
| const Op & | tvm::tir::builtin::ptx_cp_async_bulk () |
| tvm intrinsics for ptx async copy from global to shared memory using cp.async.bulk More... | |
| const Op & | tvm::tir::builtin::ptx_commit_group () |
| tvm intrinsics for ptx async copy commit and wait. More... | |
| const Op & | tvm::tir::builtin::ptx_wait_group () |
| const Op & | tvm::tir::builtin::ptx_cp_async_barrier () |
| tvm intrinsics for ptx async copy barrier using cp.async.mbarrier.arrive More... | |
| const Op & | tvm::tir::builtin::ptx_init_barrier_thread_count () |
| tvm intrinsics for ptx barrier initialization of thread count using mbarrier.init More... | |
| const Op & | tvm::tir::builtin::ptx_arrive_barrier () |
| tvm intrinsics for ptx barrier arrival using mbarrier.arrive More... | |
| const Op & | tvm::tir::builtin::ptx_arrive_barrier_expect_tx () |
| tvm intrinsic for ptx barrier arrival with expect tx using mbarrier.arrive.expect_tx More... | |
| const Op & | tvm::tir::builtin::ptx_wait_barrier () |
| tvm intrinsics for ptx barrier wait using mbarrier.try_wait More... | |
| const Op & | tvm::tir::builtin::create_barriers () |
| tvm intrinsics to create N barriers More... | |
| const Op & | tvm::tir::builtin::mma_store () |
| tvm intrinsic for storing the result of PTX MMA into a destination pointer. For example, if each thread in a warp of size 32 has 4 elements from the result of m16xn8xk16 MMA in its registers, this intrinsic can be used to store the result in a 16x8 region in shared or global memory. More... | |
| const Op & | tvm::tir::builtin::mma_fill () |
| tvm intrinsic for zero-initializing an MMA accumulation register. For example, if each thread in a warp of size 32 has 8 elements from the A matrix in m16xn8xk16 MMA in its registers, this intrinsic can be used to zero-initialize its 4 accumulation registers. More... | |
| const Op & | tvm::tir::builtin::make_filled_simdgroup_matrix () |
| tvm intrinsic for initializing and simdgroup with given value. More... | |
| const Op & | tvm::tir::builtin::simdgroup_load () |
| tvm intrinsic for loading data from device memory or threadgroup memory to simdgroup. More... | |
| const Op & | tvm::tir::builtin::simdgroup_store () |
| tvm intrinsic for storing data from simdgroup to device memory or threadgroup memory. More... | |
| const Op & | tvm::tir::builtin::simdgroup_multiply_accumulate () |
| tvm intrinsic for multiply and accumulate two matrices in simdgroup More... | |
| const Op & | tvm::tir::builtin::vectorhigh () |
| Get the high level half of the vector. More... | |
| const Op & | tvm::tir::builtin::vectorlow () |
| Get the low-level half of the vector. More... | |
| const Op & | tvm::tir::builtin::vectorcombine () |
| Concat two vectors. More... | |
| const Op & | tvm::tir::builtin::dp4a () |
| Dot product of two int8x4 vectors and add an optional accumulator. More... | |
| const Op & | tvm::tir::builtin::atomic_add () |
| atomic add instruction, corresponding e.g. to atomicAdd in CUDA More... | |
| const Op & | tvm::tir::builtin::nd_mem_alloc_with_scope () |
| Create an Nd memory allocation with storage scope. More... | |
| const Op & | tvm::tir::builtin::texture2d_store () |
| Store to texture 2d memory. More... | |
| const Op & | tvm::tir::builtin::texture2d_load () |
| Load from texture 2d memory. More... | |
| const Op & | tvm::tir::builtin::dma_copy () |
| Initiate a non-blocking DMA copy from source to destination. More... | |
| const Op & | tvm::tir::builtin::dma_wait () |
| Wait until the number of DMA groups in flight is less than or equal to some maximum. More... | |
| const Op & | tvm::tir::builtin::dma_start_group () |
| Start a group of DMA copies. More... | |
| const Op & | tvm::tir::builtin::dma_end_group () |
| End a group of DMA copies. More... | |
| const Op & | tvm::tir::builtin::assume () |
| Provide a true statement that can be used for simplifications. More... | |
| const Op & | tvm::tir::builtin::undef () |
| Returns an initialized but arbitrary value. More... | |
| const Op & | tvm::tir::builtin::start_profile_intrinsic () |
| Profiling intrinsic. More... | |
| const Op & | tvm::tir::builtin::end_profile_intrinsic () |
| Profiling intrinsic. More... | |
| const Op & | tvm::tir::builtin::anylist_getitem () |
| Get a item from any list and return it. More... | |
| const Op & | tvm::tir::builtin::anylist_resetitem () |
| Reset and clear a item in any list. More... | |
| const Op & | tvm::tir::builtin::anylist_setitem_call_packed () |
| Set an item into any list by running packed function call. More... | |
| const Op & | tvm::tir::builtin::anylist_setitem_call_cpacked () |
| Same as anylist_setitem_call_packed but use C calling convention. More... | |
| const Op & | tvm::tir::builtin::vscale () |
| Get the target's vscale value. It will be lowered to llvm.vscale intrinsic (https://llvm.org/docs/LangRef.html#llvm-vscale-intrinsic) More... | |
| const Op & | tvm::tir::builtin::get_active_lane_mask () |
| Calculate a predicate mask given an upper bound (limit) and a current value (base). More... | |
| const Op & | tvm::tir::builtin::ignore_loop_partition () |
| Annotate a predicate not be considered as target condition of loop partition. More... | |
TIR builtin intrinsics.
TIR builtin intrinsics are stored as tvm:Op. They are processed in the same way as we process Ops.
It is not necessary to create a function for every Op, as we can obtain them through Op::Get.
This file contains the most commonly used intrinsics or those that have special semantics and need compiler support.