tvm
Namespaces | Enumerations | Functions
builtin.h File Reference

TIR builtin intrinsics. More...

#include <tvm/ir/op.h>
#include <tvm/tirx/expr.h>
Include dependency graph for builtin.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Namespaces

 tvm
 An object that builds and maintains block scope and StmtSref mapping for Dependence analysis.
 
 tvm::tirx
 
 tvm::tirx::builtin
 Collection of builtin intrinsics as ops.
 

Enumerations

enum  tvm::tirx::builtin::TVMStructFieldKind : int {
  tvm::tirx::builtin::kDLTensorAddr , tvm::tirx::builtin::kDLTensorData , tvm::tirx::builtin::kDLTensorShape , tvm::tirx::builtin::kDLTensorStrides ,
  tvm::tirx::builtin::kDLTensorNDim , tvm::tirx::builtin::kDLTensorTypeCode , tvm::tirx::builtin::kDLTensorTypeBits , tvm::tirx::builtin::kDLTensorTypeLanes ,
  tvm::tirx::builtin::kDLTensorByteOffset , tvm::tirx::builtin::kDLTensorDeviceId , tvm::tirx::builtin::kDLTensorDeviceType , tvm::tirx::builtin::kDLTensorKindBound_ ,
  tvm::tirx::builtin::kTVMValueContent , tvm::tirx::builtin::kTVMFFIAnyTypeIndex , tvm::tirx::builtin::kTVMFFIAnyZeroPadding , tvm::tirx::builtin::kTVMFFIAnyUnionValue ,
  tvm::tirx::builtin::kTVMValueKindBound_ , tvm::tirx::builtin::kInt64ArrayElem
}
 The kind of structure field info used in intrinsic. More...
 

Functions

const Op & tvm::tirx::builtin::ret ()
 Return value. More...
 
const Op & tvm::tirx::builtin::thread_return ()
 Return from a GPU thread. More...
 
const Op & tvm::tirx::builtin::continue_loop ()
 Loop continue. More...
 
const Op & tvm::tirx::builtin::break_loop ()
 Loop break. More...
 
const Op & tvm::tirx::builtin::reinterpret ()
 Reinterpret the value using the target type. More...
 
const Op & tvm::tirx::builtin::likely ()
 Marks a condition is likely going to happen. More...
 
const Op & tvm::tirx::builtin::filter ()
 Thread-set filter predicate. Used as the condition of an IfThenElse to narrow the active thread set A for the then-branch. Two forms: filter(var, lo, hi) – range form, true iff var in [lo, hi) filter(var, cond) – predicate form (e.g. var == k); true iff cond var must be a ScopeIdDef-declared Var at parse time (Verifier Rule 2). More...
 
const Op & tvm::tirx::builtin::selector ()
 Analysis-only active-thread selector. More...
 
const Op & tvm::tirx::builtin::bitwise_and ()
 Bitwise and operator. More...
 
const Op & tvm::tirx::builtin::bitwise_or ()
 Bitwise or operator. More...
 
const Op & tvm::tirx::builtin::bitwise_xor ()
 Bitwise xor operator. More...
 
const Op & tvm::tirx::builtin::bitwise_not ()
 Bitwise not operator. More...
 
const Op & tvm::tirx::builtin::shift_left ()
 Left shift. More...
 
const Op & tvm::tirx::builtin::shift_right ()
 Right shift. More...
 
const Op & tvm::tirx::builtin::large_uint_imm ()
 See pesudo code. More...
 
const Op & tvm::tirx::builtin::q_multiply_shift ()
 Execute a multiplication between two Q-numbers x and y followed by a right shift s The default rounding rule is to the nearest value, rounding half up (i.e., round(x.1) = x and round (x.5) = x+1) More...
 
const Op & tvm::tirx::builtin::address_of ()
 Returns the address of an element in the buffer (see pseudocode below). More...
 
const Op & tvm::tirx::builtin::if_then_else ()
 Same as select, used for unsafe memory access. More...
 
const Op & tvm::tirx::builtin::isnullptr ()
 See pesudo code. More...
 
const Op & tvm::tirx::builtin::isnan ()
 Check if value is nan. More...
 
const Op & tvm::tirx::builtin::popcount ()
 Popcount. More...
 
const Op & tvm::tirx::builtin::fma ()
 Fused multiply add. More...
 
const Op & tvm::tirx::builtin::call_extern ()
 Call an extern C function with given name and signature from the types of args in the runtime environment. More...
 
const Op & tvm::tirx::builtin::call_pure_extern ()
 Call an pure extern C function with given name and signature from the types of args in the runtime environment. More...
 
const Op & tvm::tirx::builtin::call_llvm_intrin ()
 Call an LLVM intrinsic with a given intrinsic id and signature from the types of args in the runtime environment. More...
 
const Op & tvm::tirx::builtin::call_llvm_pure_intrin ()
 Call an LLVM pure intrinsic with a given intrinsic id and signature from the types of args in the runtime environment. More...
 
const Op & tvm::tirx::builtin::call_spirv_pure_glsl450 ()
 Call an SPIRV pure GLSL450 intrinsic. More...
 
const Op & tvm::tirx::builtin::prefetch ()
 same signature as llvm.prefetch More...
 
const Op & tvm::tirx::builtin::tvm_access_ptr ()
 Get head access address with memory access pattern info. More...
 
const Op & tvm::tirx::builtin::tvm_static_handle ()
 Create a function local static handle that iniitalizes to nullptr. can be used to cache function local static resources. More...
 
const Op & tvm::tirx::builtin::tvm_context_id ()
 Return a unique context id, used for hint of workspace separation. Different context id ganrantees not having overlapping workspace. More...
 
const Op & tvm::tirx::builtin::tvm_tuple ()
 tvm_tuple is not an actual function and cannot codegen. It is used to represent tuple structure in value field of AttrStmt, for the sake of giving hint to optimization. More...
 
const Op & tvm::tirx::builtin::handle_add_byte_offset ()
 See pesudo code. More...
 
const Op & tvm::tirx::builtin::tvm_struct_get ()
 See pesudo code. More...
 
const Op & tvm::tirx::builtin::tvm_struct_set ()
 See pesudo code. More...
 
const Op & tvm::tirx::builtin::lookup_param ()
 See pseudo code Type lookup_param(ffi::String param_name) { return __tvm_param__param_name; }. More...
 
const Op & tvm::tirx::builtin::tvm_throw_last_error ()
 See pesudo code. More...
 
const Op & tvm::tirx::builtin::tvm_stack_alloca ()
 See pesudo code. More...
 
const Op & tvm::tirx::builtin::tvm_stack_make_shape ()
 Allocate a shape tuple on stack, return the handle. More...
 
const Op & tvm::tirx::builtin::tvm_stack_make_array ()
 Allocate a Tensor(DLTensor) on stack, return the handle. More...
 
const Op & tvm::tirx::builtin::tvm_call_packed ()
 See pesudo code. More...
 
const Op & tvm::tirx::builtin::tvm_call_cpacked ()
 See pesudo code. More...
 
const Op & tvm::tirx::builtin::tvm_call_trace_packed ()
 See pesudo code. More...
 
const Op & tvm::tirx::builtin::tvm_thread_invariant ()
 Mark a condition to be thread invariant. This means the condition must be the same for all threads. More...
 
const Op & tvm::tirx::builtin::tvm_call_packed_lowered ()
 Lowered version of call packed, the space of value and type codes are explicitly allocated. More...
 
const Op & tvm::tirx::builtin::tvm_call_cpacked_lowered ()
 Lowered version of call c-packed, the space of value and type codes are explicitly allocated. More...
 
const Op & tvm::tirx::builtin::tvm_call_trace_packed_lowered ()
 Lowered version of trace intrinsic, the space of value and type codes are explicitly allocated. The return value is the (end - 1) value on the stack. More...
 
const Op & tvm::tirx::builtin::tvm_storage_sync ()
 See pseudo code. More...
 
const Op & tvm::tirx::builtin::tvm_warp_shuffle ()
 See pseudo code. More...
 
const Op & tvm::tirx::builtin::tvm_warp_shuffle_up ()
 
const Op & tvm::tirx::builtin::tvm_warp_shuffle_down ()
 
const Op & tvm::tirx::builtin::tvm_warp_shuffle_xor ()
 
const Op & tvm::tirx::builtin::tvm_warp_activemask ()
 
const Op & tvm::tirx::builtin::tvm_global_barrier_kinit ()
 Initialize the global barrier. Call this at beginning of kernel that need global barrier. More...
 
const Op & tvm::tirx::builtin::tvm_thread_allreduce ()
 See pesudo code. More...
 
const Op & tvm::tirx::builtin::make_filled_simdgroup_matrix ()
 tvm intrinsic for initializing and simdgroup with given value. More...
 
const Op & tvm::tirx::builtin::simdgroup_load ()
 tvm intrinsic for loading data from device memory or threadgroup memory to simdgroup. More...
 
const Op & tvm::tirx::builtin::simdgroup_store ()
 tvm intrinsic for storing data from simdgroup to device memory or threadgroup memory. More...
 
const Op & tvm::tirx::builtin::simdgroup_multiply_accumulate ()
 tvm intrinsic for multiply and accumulate two matrices in simdgroup More...
 
const Op & tvm::tirx::builtin::cooperative_tensor_fill ()
 Fill a cooperative_tensor with a given value. More...
 
const Op & tvm::tirx::builtin::cooperative_tensor_load ()
 Load data from device or threadgroup memory into a cooperative_tensor. More...
 
const Op & tvm::tirx::builtin::cooperative_tensor_store ()
 Store data from a cooperative_tensor to device or threadgroup memory. More...
 
const Op & tvm::tirx::builtin::cooperative_tensor_multiply_accumulate ()
 Multiply and accumulate two matrices using cooperative_tensor (MetalPerformancePrimitives matmul2d). More...
 
const Op & tvm::tirx::builtin::vectorhigh ()
 Get the high level half of the vector. More...
 
const Op & tvm::tirx::builtin::vectorlow ()
 Get the low-level half of the vector. More...
 
const Op & tvm::tirx::builtin::vectorcombine ()
 Concat two vectors. More...
 
const Op & tvm::tirx::builtin::dp4a ()
 Dot product of two int8x4 vectors and add an optional accumulator. More...
 
const Op & tvm::tirx::builtin::atomic_add ()
 atomic add instruction, corresponding e.g. to atomicAdd in CUDA More...
 
const Op & tvm::tirx::builtin::nd_mem_alloc_with_scope ()
 Create an Nd memory allocation with storage scope. More...
 
const Op & tvm::tirx::builtin::texture2d_store ()
 Store to texture 2d memory. More...
 
const Op & tvm::tirx::builtin::texture2d_load ()
 Load from texture 2d memory. More...
 
const Op & tvm::tirx::builtin::dma_copy ()
 Initiate a non-blocking DMA copy from source to destination. More...
 
const Op & tvm::tirx::builtin::dma_wait ()
 Wait until the number of DMA groups in flight is less than or equal to some maximum. More...
 
const Op & tvm::tirx::builtin::dma_start_group ()
 Start a group of DMA copies. More...
 
const Op & tvm::tirx::builtin::dma_end_group ()
 End a group of DMA copies. More...
 
const Op & tvm::tirx::builtin::assume ()
 Provide a true statement that can be used for simplifications. More...
 
const Op & tvm::tirx::builtin::undef ()
 Returns an initialized but arbitrary value. More...
 
const Op & tvm::tirx::builtin::start_profile_intrinsic ()
 Profiling intrinsic. More...
 
const Op & tvm::tirx::builtin::end_profile_intrinsic ()
 Profiling intrinsic. More...
 
const Op & tvm::tirx::builtin::anylist_getitem ()
 Get a item from any list and return it. More...
 
const Op & tvm::tirx::builtin::anylist_resetitem ()
 Reset and clear a item in any list. More...
 
const Op & tvm::tirx::builtin::anylist_setitem_call_packed ()
 Set an item into any list by running packed function call. More...
 
const Op & tvm::tirx::builtin::anylist_setitem_call_cpacked ()
 Same as anylist_setitem_call_packed but use C calling convention. More...
 
const Op & tvm::tirx::builtin::vscale ()
 Get the target's vscale value. It will be lowered to llvm.vscale intrinsic (https://llvm.org/docs/LangRef.html#llvm-vscale-intrinsic) More...
 
const Op & tvm::tirx::builtin::get_active_lane_mask ()
 Calculate a predicate mask given an upper bound (limit) and a current value (base). More...
 
const Op & tvm::tirx::builtin::ignore_loop_partition ()
 Annotate a predicate not be considered as target condition of loop partition. More...
 
const Op & tvm::tirx::builtin::buffer_offset ()
 Get the element offset of a buffer given logical indices. More...
 
const Op & tvm::tirx::builtin::print_buffer ()
 Print the content of a buffer during runtime. More...
 
const Op & tvm::tirx::builtin::timer_init_cuda ()
 tvm intrinsic for initializing the CUDA profiler, and store profiling result in a buffer. More...
 
const Op & tvm::tirx::builtin::timer_start_cuda ()
 tvm intrinsic for starting the timer for profiling a specific event, and storing profiling result in a buffer. More...
 
const Op & tvm::tirx::builtin::timer_end_cuda ()
 tvm intrinsic for ending the timer for profiling a specific event, and storing profiling result in a buffer. More...
 
const Op & tvm::tirx::builtin::timer_finalize_cuda ()
 tvm intrinsic for finalize the timer for profiling, and storing profiling result in a buffer. More...
 
const Op & tvm::tirx::builtin::cuda_atomic_add ()
 tvm intrinsic for cuda atomic add instruction More...
 
const Op & tvm::tirx::builtin::cuda_thread_fence ()
 tvm intrinsic for cuda thread fence instruction More...
 
const Op & tvm::tirx::builtin::cuda_warp_reduce ()
 Warp-level butterfly shuffle-XOR reduction. More...
 
const Op & tvm::tirx::builtin::cuda_cta_reduce ()
 CTA-wide reduction via warp shuffle + shared memory. More...
 
const Op & tvm::tirx::builtin::cuda_copy_bytes ()
 Typed load/store copy of num_bytes bytes. More...
 
const Op & tvm::tirx::builtin::cuda_warp_sync ()
 tvm intrinsic for cuda warp sync instruction More...
 
const Op & tvm::tirx::builtin::cuda_cta_sync ()
 tvm intrinsic for cuda block-wide sync (syncthreads) More...
 
const Op & tvm::tirx::builtin::cuda_grid_sync ()
 tvm intrinsic for cuda grid-wide sync (cooperative groups) More...
 
const Op & tvm::tirx::builtin::cuda_thread_rank ()
 tvm intrinsic that returns cooperative_groups::thread_rank() for the enclosing CTA (linear thread index within the block). More...
 
const Op & tvm::tirx::builtin::cuda_half2float ()
 tvm intrinsic for cuda half to float conversion More...
 
const Op & tvm::tirx::builtin::cuda_bfloat162float ()
 tvm intrinsic for cuda bfloat16 to float conversion More...
 
const Op & tvm::tirx::builtin::cuda_float22half2 ()
 tvm intrinsic for a helper converting float2 to half2 with rounding More...
 
const Op & tvm::tirx::builtin::cuda_trap_when_assert_failed ()
 tvm intrinsic to trap when an assertion failed (cond == false) More...
 
const Op & tvm::tirx::builtin::cuda_runtime_instr_desc ()
 tvm intrinsic to modify runtime instruction descriptor More...
 
const Op & tvm::tirx::builtin::cuda_half8tofloat8 ()
 tvm intrinsic to convert 8 half2 lanes to 8 float2 lanes More...
 
const Op & tvm::tirx::builtin::cuda_float8tohalf8 ()
 tvm intrinsic to convert 8 float2 lanes to 8 half2 lanes with rounding More...
 
const Op & tvm::tirx::builtin::cuda_syncthreads_and ()
 tvm intrinsic for cuda syncthreads_and instruction More...
 
const Op & tvm::tirx::builtin::cuda_syncthreads_or ()
 tvm intrinsic for cuda syncthreads_or instruction More...
 
const Op & tvm::tirx::builtin::cuda_nano_sleep ()
 tvm intrinsic for cuda nano sleep instruction More...
 
const Op & tvm::tirx::builtin::cuda_atomic_cas ()
 tvm intrinsic for cuda atomic compare and swap instruction More...
 
const Op & tvm::tirx::builtin::cuda_printf ()
 tvm intrinsic for cuda printf instruction More...
 
const Op & tvm::tirx::builtin::cuda_ldg ()
 tvm intrinsic for cuda ldg instruction More...
 
const Op & tvm::tirx::builtin::cuda_get_tmem_addr ()
 tvm intrinsic for cuda tmem address calculation More...
 
const Op & tvm::tirx::builtin::ptx_exp2 ()
 tvm intrinsic for PTX fast exp2 approximation (ex2.approx.ftz.f32) More...
 
const Op & tvm::tirx::builtin::ptx_rcp ()
 tvm intrinsic for PTX fast reciprocal approximation (rcp.approx.ftz.f32) More...
 
const Op & tvm::tirx::builtin::ptx_any_sync ()
 tvm intrinsic for PTX warp-wide any predicate (__any_sync) More...
 
const Op & tvm::tirx::builtin::ptx_reduce3_max_f32 ()
 tvm intrinsic for PTX 3-input max instruction (sm_100a+) More...
 
const Op & tvm::tirx::builtin::ptx_reduce3_min_f32 ()
 tvm intrinsic for PTX 3-input min instruction (sm_100a+) More...
 
const Op & tvm::tirx::builtin::ptx_add_packed_f32x2 ()
 tvm intrinsic for PTX packed add instruction (sm_100a+) More...
 
const Op & tvm::tirx::builtin::ptx_sub_packed_f32x2 ()
 tvm intrinsic for PTX packed subtract instruction (sm_100a+) More...
 
const Op & tvm::tirx::builtin::ptx_mul_packed_f32x2 ()
 tvm intrinsic for PTX packed multiply instruction (sm_100a+) More...
 
const Op & tvm::tirx::builtin::ptx_fma_packed_f32x2 ()
 tvm intrinsic for PTX packed FMA instruction (sm_100a+) More...
 

Detailed Description

TIR builtin intrinsics.

TIR builtin intrinsics are stored as tvm:Op. They are processed in the same way as we process Ops.

It is not necessary to create a function for every Op, as we can obtain them through Op::Get.

This file contains the most commonly used intrinsics or those that have special semantics and need compiler support.