tvm
All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
Namespaces | Enumerations | Functions
builtin.h File Reference

TIR builtin intrinsics. More...

#include <tvm/ir/op.h>
#include <tvm/tir/expr.h>
Include dependency graph for builtin.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Namespaces

 tvm
 runtime implementation for LibTorch/TorchScript.
 
 tvm::tir
 
 tvm::tir::builtin
 Collection of builtin intrinsics as ops.
 

Enumerations

enum  tvm::tir::builtin::TVMStructFieldKind : int {
  tvm::tir::builtin::kArrAddr , tvm::tir::builtin::kArrData , tvm::tir::builtin::kArrShape , tvm::tir::builtin::kArrStrides ,
  tvm::tir::builtin::kArrNDim , tvm::tir::builtin::kArrTypeCode , tvm::tir::builtin::kArrTypeBits , tvm::tir::builtin::kArrTypeLanes ,
  tvm::tir::builtin::kArrByteOffset , tvm::tir::builtin::kArrDeviceId , tvm::tir::builtin::kArrDeviceType , tvm::tir::builtin::kArrKindBound_ ,
  tvm::tir::builtin::kTVMValueContent , tvm::tir::builtin::kTVMValueKindBound_
}
 The kind of structure field info used in intrinsic. More...
 

Functions

const Op & tvm::tir::builtin::ret ()
 Return value. More...
 
const Op & tvm::tir::builtin::reinterpret ()
 Reinterpret the value using the target type. More...
 
const Op & tvm::tir::builtin::likely ()
 Marks a condition is likely going to happen. More...
 
const Op & tvm::tir::builtin::bitwise_and ()
 Bitwise and operator. More...
 
const Op & tvm::tir::builtin::bitwise_or ()
 Bitwise or operator. More...
 
const Op & tvm::tir::builtin::bitwise_xor ()
 Bitwise xor operator. More...
 
const Op & tvm::tir::builtin::bitwise_not ()
 Bitwise not operator. More...
 
const Op & tvm::tir::builtin::shift_left ()
 Left shift. More...
 
const Op & tvm::tir::builtin::shift_right ()
 Right shift. More...
 
const Op & tvm::tir::builtin::large_uint_imm ()
 See pesudo code. More...
 
const Op & tvm::tir::builtin::q_multiply_shift ()
 Execute a multiplication between two Q-numbers x and y followed by a right shift s The default rounding rule is to the nearest value, rounding half up (i.e., round(x.1) = x and round (x.5) = x+1) More...
 
const Op & tvm::tir::builtin::address_of ()
 Returns the address of an element in the buffer (see pseudocode below). More...
 
const Op & tvm::tir::builtin::if_then_else ()
 Same as select, used for unsafe memory access. More...
 
const Op & tvm::tir::builtin::isnullptr ()
 See pesudo code. More...
 
const Op & tvm::tir::builtin::isnan ()
 Check if value is nan. More...
 
const Op & tvm::tir::builtin::popcount ()
 Popcount. More...
 
const Op & tvm::tir::builtin::fma ()
 Fused multiply add. More...
 
const Op & tvm::tir::builtin::call_extern ()
 Call an extern C function with given name and signature from the types of args in the runtime environment. More...
 
const Op & tvm::tir::builtin::call_pure_extern ()
 Call an pure extern C function with given name and signature from the types of args in the runtime environment. More...
 
const Op & tvm::tir::builtin::call_llvm_intrin ()
 Call an LLVM intrinsic with a given intrinsic id and signature from the types of args in the runtime environment. More...
 
const Op & tvm::tir::builtin::call_llvm_pure_intrin ()
 Call an LLVM pure intrinsic with a given intrinsic id and signature from the types of args in the runtime environment. More...
 
const Op & tvm::tir::builtin::call_spirv_pure_glsl450 ()
 Call an SPIRV pure GLSL450 intrinsic. More...
 
const Op & tvm::tir::builtin::prefetch ()
 Prefetch a cacheline. More...
 
const Op & tvm::tir::builtin::tvm_access_ptr ()
 Get head access address with memory access pattern info. More...
 
const Op & tvm::tir::builtin::tvm_static_handle ()
 Create a function local static handle that iniitalizes to nullptr. can be used to cache function local static resources. More...
 
const Op & tvm::tir::builtin::tvm_context_id ()
 Return a unique context id, used for hint of workspace separation. Different context id ganrantees not having overlapping workspace. More...
 
const Op & tvm::tir::builtin::tvm_tuple ()
 tvm_tuple is not an actual function and cannot codegen. It is used to represent tuple structure in value field of AttrStmt, for the sake of giving hint to optimization. More...
 
const Op & tvm::tir::builtin::tvm_struct_get ()
 See pesudo code. More...
 
const Op & tvm::tir::builtin::tvm_struct_set ()
 See pesudo code. More...
 
const Op & tvm::tir::builtin::lookup_param ()
 See pseudo code Type lookup_param(String param_name) { return __tvm_param__param_name; }. More...
 
const Op & tvm::tir::builtin::tvm_throw_last_error ()
 See pesudo code. More...
 
const Op & tvm::tir::builtin::tvm_stack_alloca ()
 See pesudo code. More...
 
const Op & tvm::tir::builtin::tvm_stack_make_shape ()
 Allocate a shape tuple on stack, return the handle. More...
 
const Op & tvm::tir::builtin::tvm_stack_make_array ()
 Allocate a NDArray(DLTensor) on stack, return the handle. More...
 
const Op & tvm::tir::builtin::tvm_call_packed ()
 See pesudo code. More...
 
const Op & tvm::tir::builtin::tvm_call_cpacked ()
 See pesudo code. More...
 
const Op & tvm::tir::builtin::tvm_call_trace_packed ()
 See pesudo code. More...
 
const Op & tvm::tir::builtin::tvm_check_return ()
 Checks the return value of another call is correct or returns a given value. More...
 
const Op & tvm::tir::builtin::tvm_thread_context ()
 See pesudo code Mark the content as thread local context, can get optimized by only call the call once at thread start. More...
 
const Op & tvm::tir::builtin::tvm_call_packed_lowered ()
 Lowered version of call packed, the space of value and type codes are explicitly allocated. More...
 
const Op & tvm::tir::builtin::tvm_call_cpacked_lowered ()
 Lowered version of call c-packed, the space of value and type codes are explicitly allocated. More...
 
const Op & tvm::tir::builtin::tvm_call_trace_packed_lowered ()
 Lowered version of trace intrinsic, the space of value and type codes are explicitly allocated. The return value is the (end - 1) value on the stack. More...
 
const Op & tvm::tir::builtin::tvm_storage_sync ()
 See pseudo code. More...
 
const Op & tvm::tir::builtin::tvm_warp_shuffle ()
 See pseudo code. More...
 
const Op & tvm::tir::builtin::tvm_warp_shuffle_up ()
 
const Op & tvm::tir::builtin::tvm_warp_shuffle_down ()
 
const Op & tvm::tir::builtin::tvm_warp_activemask ()
 
const Op & tvm::tir::builtin::tvm_global_barrier_kinit ()
 Initialize the global barrier. Call this at beginning of kernel that need global barrier. More...
 
const Op & tvm::tir::builtin::tvm_thread_allreduce ()
 See pesudo code. More...
 
const Op & tvm::tir::builtin::tvm_load_matrix_sync ()
 tvm intrinsic for tensor core load operators. More...
 
const Op & tvm::tir::builtin::tvm_mma_sync ()
 tvm intrinsic for tensor core mma_sync operators. More...
 
const Op & tvm::tir::builtin::tvm_bmma_sync ()
 tvm intrinsic for tensor core bmma_sync operators. More...
 
const Op & tvm::tir::builtin::tvm_fill_fragment ()
 tvm intrinsic for tensor core fill_fragment operators. More...
 
const Op & tvm::tir::builtin::tvm_store_matrix_sync ()
 tvm intrinsic for tensor core store operators. More...
 
const Op & tvm::tir::builtin::ptx_mma ()
 tvm intrinsic for ptx tensor core mma instructions. More...
 
const Op & tvm::tir::builtin::ptx_ldg32 ()
 tvm intrinsic for ptx predicate load with 32-bit data type. More...
 
const Op & tvm::tir::builtin::ptx_mma_sp ()
 tvm intrinsic for sparse tensor core ptx instructions. More...
 
const Op & tvm::tir::builtin::ptx_ldmatrix ()
 tvm intrinsic for ptx load matrix from shared memory. More...
 
const Op & tvm::tir::builtin::ptx_cp_async ()
 tvm intrinsics for ptx async copy from global to shared memory More...
 
const Op & tvm::tir::builtin::ptx_commit_group ()
 tvm intrinsics for ptx async copy commit and wait. More...
 
const Op & tvm::tir::builtin::ptx_wait_group ()
 
const Op & tvm::tir::builtin::mma_store ()
 tvm intrinsic for storing the result of PTX MMA into a destination pointer. For example, if each thread in a warp of size 32 has 4 elements from the result of m16xn8xk16 MMA in its registers, this intrinsic can be used to store the result in a 16x8 region in shared or global memory. More...
 
const Op & tvm::tir::builtin::mma_fill ()
 tvm intrinsic for zero-initalizing an MMA accumulation registor. For example, if each thread in a warp of size 32 has 8 elements from the A matrix in m16xn8xk16 MMA in its registers, this intrinsic can be used to zero-initialize its 4 accumulation registers. More...
 
const Op & tvm::tir::builtin::vectorhigh ()
 Get the high level half of the vector. More...
 
const Op & tvm::tir::builtin::vectorlow ()
 Get the low-level half of the vector. More...
 
const Op & tvm::tir::builtin::vectorcombine ()
 Concat two vectors. More...
 
const Op & tvm::tir::builtin::atomic_add ()
 atomic add instruction, corresponding e.g. to atomicAdd in CUDA More...
 
const Op & tvm::tir::builtin::nd_mem_alloc_with_scope ()
 Create an Nd memory allocation with storage scope. More...
 
const Op & tvm::tir::builtin::texture2d_store ()
 Store to texture 2d memory. More...
 
const Op & tvm::tir::builtin::texture2d_load ()
 Load from texture 2d memory. More...
 
const Op & tvm::tir::builtin::dma_copy ()
 Initiate a non-blocking DMA copy from source to destination. More...
 
const Op & tvm::tir::builtin::dma_wait ()
 Wait until the number of DMA groups in flight is less than or equal to some maximum. More...
 
const Op & tvm::tir::builtin::dma_start_group ()
 Start a group of DMA copies. More...
 
const Op & tvm::tir::builtin::dma_end_group ()
 End a group of DMA copies. More...
 
const Op & tvm::tir::builtin::assume ()
 Provide a true statement that can be used for simplifications. More...
 
const Op & tvm::tir::builtin::undef ()
 Returns an initialized but arbitrary value. More...
 
const Op & tvm::tir::builtin::start_profile_intrinsic ()
 Profiling intrinsic. More...
 
const Op & tvm::tir::builtin::end_profile_intrinsic ()
 Profiling intrinsic. More...
 

Detailed Description

TIR builtin intrinsics.

TIR builtin intrinsics are stored as tvm:Op. They are processed in the same way as we process Ops.

It is not necessary to create a function for every Op, as we can obtain them through Op::Get.

This file contains the most commonly used intrinsics or those that have special semantics and need compiler support.