TIR builtin intrinsics. More...

#include <tvm/ir/op.h>
#include <tvm/tir/expr.h>

Include dependency graph for builtin.h:

This graph shows which files directly or indirectly include this file:

Namespaces
	tvm
	runtime implementation for LibTorch/TorchScript.

	tvm::tir

	tvm::tir::builtin
	Collection of builtin intrinsics as ops.

Enumerations
enum	tvm::tir::builtin::TVMStructFieldKind : int { tvm::tir::builtin::kArrAddr , tvm::tir::builtin::kArrData , tvm::tir::builtin::kArrShape , tvm::tir::builtin::kArrStrides , tvm::tir::builtin::kArrNDim , tvm::tir::builtin::kArrTypeCode , tvm::tir::builtin::kArrTypeBits , tvm::tir::builtin::kArrTypeLanes , tvm::tir::builtin::kArrByteOffset , tvm::tir::builtin::kArrDeviceId , tvm::tir::builtin::kArrDeviceType , tvm::tir::builtin::kArrKindBound_ , tvm::tir::builtin::kTVMValueContent , tvm::tir::builtin::kTVMValueKindBound_ }
	The kind of structure field info used in intrinsic. More...

Functions
const Op &	tvm::tir::builtin::ret ()
	Return value. More...

const Op &	tvm::tir::builtin::reinterpret ()
	Reinterpret the value using the target type. More...

const Op &	tvm::tir::builtin::likely ()
	Marks a condition is likely going to happen. More...

const Op &	tvm::tir::builtin::bitwise_and ()
	Bitwise and operator. More...

const Op &	tvm::tir::builtin::bitwise_or ()
	Bitwise or operator. More...

const Op &	tvm::tir::builtin::bitwise_xor ()
	Bitwise xor operator. More...

const Op &	tvm::tir::builtin::bitwise_not ()
	Bitwise not operator. More...

const Op &	tvm::tir::builtin::shift_left ()
	Left shift. More...

const Op &	tvm::tir::builtin::shift_right ()
	Right shift. More...

const Op &	tvm::tir::builtin::large_uint_imm ()
	See pesudo code. More...

const Op &	tvm::tir::builtin::q_multiply_shift ()
	Execute a multiplication between two Q-numbers x and y followed by a right shift s The default rounding rule is to the nearest value, rounding half up (i.e., round(x.1) = x and round (x.5) = x+1) More...

const Op &	tvm::tir::builtin::address_of ()
	Returns the address of an element in the buffer (see pseudocode below). More...

const Op &	tvm::tir::builtin::if_then_else ()
	Same as select, used for unsafe memory access. More...

const Op &	tvm::tir::builtin::isnullptr ()
	See pesudo code. More...

const Op &	tvm::tir::builtin::isnan ()
	Check if value is nan. More...

const Op &	tvm::tir::builtin::popcount ()
	Popcount. More...

const Op &	tvm::tir::builtin::fma ()
	Fused multiply add. More...

const Op &	tvm::tir::builtin::call_extern ()
	Call an extern C function with given name and signature from the types of args in the runtime environment. More...

const Op &	tvm::tir::builtin::call_pure_extern ()
	Call an pure extern C function with given name and signature from the types of args in the runtime environment. More...

const Op &	tvm::tir::builtin::call_llvm_intrin ()
	Call an LLVM intrinsic with a given intrinsic id and signature from the types of args in the runtime environment. More...

const Op &	tvm::tir::builtin::call_llvm_pure_intrin ()
	Call an LLVM pure intrinsic with a given intrinsic id and signature from the types of args in the runtime environment. More...

const Op &	tvm::tir::builtin::call_spirv_pure_glsl450 ()
	Call an SPIRV pure GLSL450 intrinsic. More...

const Op &	tvm::tir::builtin::prefetch ()
	Prefetch a cacheline. More...

const Op &	tvm::tir::builtin::tvm_access_ptr ()
	Get head access address with memory access pattern info. More...

const Op &	tvm::tir::builtin::tvm_static_handle ()
	Create a function local static handle that iniitalizes to nullptr. can be used to cache function local static resources. More...

const Op &	tvm::tir::builtin::tvm_context_id ()
	Return a unique context id, used for hint of workspace separation. Different context id ganrantees not having overlapping workspace. More...

const Op &	tvm::tir::builtin::tvm_tuple ()
	tvm_tuple is not an actual function and cannot codegen. It is used to represent tuple structure in value field of AttrStmt, for the sake of giving hint to optimization. More...

const Op &	tvm::tir::builtin::tvm_struct_get ()
	See pesudo code. More...

const Op &	tvm::tir::builtin::tvm_struct_set ()
	See pesudo code. More...

const Op &	tvm::tir::builtin::lookup_param ()
	See pseudo code Type lookup_param(String param_name) { return __tvm_param__param_name; }. More...

const Op &	tvm::tir::builtin::tvm_throw_last_error ()
	See pesudo code. More...

const Op &	tvm::tir::builtin::tvm_stack_alloca ()
	See pesudo code. More...

const Op &	tvm::tir::builtin::tvm_stack_make_shape ()
	Allocate a shape tuple on stack, return the handle. More...

const Op &	tvm::tir::builtin::tvm_stack_make_array ()
	Allocate a NDArray(DLTensor) on stack, return the handle. More...

const Op &	tvm::tir::builtin::tvm_call_packed ()
	See pesudo code. More...

const Op &	tvm::tir::builtin::tvm_call_cpacked ()
	See pesudo code. More...

const Op &	tvm::tir::builtin::tvm_call_trace_packed ()
	See pesudo code. More...

const Op &	tvm::tir::builtin::tvm_check_return ()
	Checks the return value of another call is correct or returns a given value. More...

const Op &	tvm::tir::builtin::tvm_thread_context ()
	See pesudo code Mark the content as thread local context, can get optimized by only call the call once at thread start. More...

const Op &	tvm::tir::builtin::tvm_thread_invariant ()
	Mark a condition to be thread invariant. This means the condition must be the same for all threads. More...

const Op &	tvm::tir::builtin::tvm_call_packed_lowered ()
	Lowered version of call packed, the space of value and type codes are explicitly allocated. More...

const Op &	tvm::tir::builtin::tvm_call_cpacked_lowered ()
	Lowered version of call c-packed, the space of value and type codes are explicitly allocated. More...

const Op &	tvm::tir::builtin::tvm_call_trace_packed_lowered ()
	Lowered version of trace intrinsic, the space of value and type codes are explicitly allocated. The return value is the (end - 1) value on the stack. More...

const Op &	tvm::tir::builtin::tvm_storage_sync ()
	See pseudo code. More...

const Op &	tvm::tir::builtin::tvm_warp_shuffle ()
	See pseudo code. More...

const Op &	tvm::tir::builtin::tvm_warp_shuffle_up ()

const Op &	tvm::tir::builtin::tvm_warp_shuffle_down ()

const Op &	tvm::tir::builtin::tvm_warp_activemask ()

const Op &	tvm::tir::builtin::tvm_global_barrier_kinit ()
	Initialize the global barrier. Call this at beginning of kernel that need global barrier. More...

const Op &	tvm::tir::builtin::tvm_thread_allreduce ()
	See pesudo code. More...

const Op &	tvm::tir::builtin::tvm_load_matrix_sync ()
	tvm intrinsic for tensor core load operators. More...

const Op &	tvm::tir::builtin::tvm_mma_sync ()
	tvm intrinsic for tensor core mma_sync operators. More...

const Op &	tvm::tir::builtin::tvm_bmma_sync ()
	tvm intrinsic for tensor core bmma_sync operators. More...

const Op &	tvm::tir::builtin::tvm_fill_fragment ()
	tvm intrinsic for tensor core fill_fragment operators. More...

const Op &	tvm::tir::builtin::tvm_store_matrix_sync ()
	tvm intrinsic for tensor core store operators. More...

const Op &	tvm::tir::builtin::ptx_mma ()
	tvm intrinsic for ptx tensor core mma instructions. More...

const Op &	tvm::tir::builtin::ptx_ldg32 ()
	tvm intrinsic for ptx predicate load with 32-bit data type. More...

const Op &	tvm::tir::builtin::ptx_mma_sp ()
	tvm intrinsic for sparse tensor core ptx instructions. More...

const Op &	tvm::tir::builtin::ptx_ldmatrix ()
	tvm intrinsic for ptx load matrix from shared memory. More...

const Op &	tvm::tir::builtin::ptx_cp_async ()
	tvm intrinsics for ptx async copy from global to shared memory using cp.async More...

const Op &	tvm::tir::builtin::ptx_cp_async_bulk ()
	tvm intrinsics for ptx async copy from global to shared memory using cp.async.bulk More...

const Op &	tvm::tir::builtin::ptx_commit_group ()
	tvm intrinsics for ptx async copy commit and wait. More...

const Op &	tvm::tir::builtin::ptx_wait_group ()

const Op &	tvm::tir::builtin::ptx_cp_async_barrier ()
	tvm intrinsics for ptx async copy barrier using cp.async.mbarrier.arrive More...

const Op &	tvm::tir::builtin::ptx_init_barrier_thread_count ()
	tvm intrinsics for ptx barrier initialization of thread count using mbarrier.init More...

const Op &	tvm::tir::builtin::ptx_arrive_barrier ()
	tvm intrinsics for ptx barrier arrival using mbarrier.arrive More...

const Op &	tvm::tir::builtin::ptx_arrive_barrier_expect_tx ()
	tvm intrinsic for ptx barrier arrival with expect tx using mbarrier.arrive.expect_tx More...

const Op &	tvm::tir::builtin::ptx_wait_barrier ()
	tvm intrinsics for ptx barrier wait using mbarrier.try_wait More...

const Op &	tvm::tir::builtin::create_barriers ()
	tvm intrinsics to create N barriers More...

const Op &	tvm::tir::builtin::mma_store ()
	tvm intrinsic for storing the result of PTX MMA into a destination pointer. For example, if each thread in a warp of size 32 has 4 elements from the result of m16xn8xk16 MMA in its registers, this intrinsic can be used to store the result in a 16x8 region in shared or global memory. More...

const Op &	tvm::tir::builtin::mma_fill ()
	tvm intrinsic for zero-initializing an MMA accumulation register. For example, if each thread in a warp of size 32 has 8 elements from the A matrix in m16xn8xk16 MMA in its registers, this intrinsic can be used to zero-initialize its 4 accumulation registers. More...

const Op &	tvm::tir::builtin::make_filled_simdgroup_matrix ()
	tvm intrinsic for initializing and simdgroup with given value. More...

const Op &	tvm::tir::builtin::simdgroup_load ()
	tvm intrinsic for loading data from device memory or threadgroup memory to simdgroup. More...

const Op &	tvm::tir::builtin::simdgroup_store ()
	tvm intrinsic for storing data from simdgroup to device memory or threadgroup memory. More...

const Op &	tvm::tir::builtin::simdgroup_multiply_accumulate ()
	tvm intrinsic for multiply and accumulate two matrices in simdgroup More...

const Op &	tvm::tir::builtin::vectorhigh ()
	Get the high level half of the vector. More...

const Op &	tvm::tir::builtin::vectorlow ()
	Get the low-level half of the vector. More...

const Op &	tvm::tir::builtin::vectorcombine ()
	Concat two vectors. More...

const Op &	tvm::tir::builtin::dp4a ()
	Dot product of two int8x4 vectors and add an optional accumulator. More...

const Op &	tvm::tir::builtin::atomic_add ()
	atomic add instruction, corresponding e.g. to atomicAdd in CUDA More...

const Op &	tvm::tir::builtin::nd_mem_alloc_with_scope ()
	Create an Nd memory allocation with storage scope. More...

const Op &	tvm::tir::builtin::texture2d_store ()
	Store to texture 2d memory. More...

const Op &	tvm::tir::builtin::texture2d_load ()
	Load from texture 2d memory. More...

const Op &	tvm::tir::builtin::dma_copy ()
	Initiate a non-blocking DMA copy from source to destination. More...

const Op &	tvm::tir::builtin::dma_wait ()
	Wait until the number of DMA groups in flight is less than or equal to some maximum. More...

const Op &	tvm::tir::builtin::dma_start_group ()
	Start a group of DMA copies. More...

const Op &	tvm::tir::builtin::dma_end_group ()
	End a group of DMA copies. More...

const Op &	tvm::tir::builtin::assume ()
	Provide a true statement that can be used for simplifications. More...

const Op &	tvm::tir::builtin::undef ()
	Returns an initialized but arbitrary value. More...

const Op &	tvm::tir::builtin::start_profile_intrinsic ()
	Profiling intrinsic. More...

const Op &	tvm::tir::builtin::end_profile_intrinsic ()
	Profiling intrinsic. More...

const Op &	tvm::tir::builtin::anylist_getitem ()
	Get a item from any list and return it. More...

const Op &	tvm::tir::builtin::anylist_resetitem ()
	Reset and clear a item in any list. More...

const Op &	tvm::tir::builtin::anylist_setitem_call_packed ()
	Set an item into any list by running packed function call. More...

const Op &	tvm::tir::builtin::anylist_setitem_call_cpacked ()
	Same as anylist_setitem_call_packed but use C calling convention. More...

const Op &	tvm::tir::builtin::vscale ()
	Get the target's vscale value. It will be lowered to llvm.vscale intrinsic (https://llvm.org/docs/LangRef.html#llvm-vscale-intrinsic) More...

const Op &	tvm::tir::builtin::get_active_lane_mask ()
	Calculate a predicate mask given an upper bound (limit) and a current value (base). More...

Detailed Description

TIR builtin intrinsics.

TIR builtin intrinsics are stored as tvm:Op. They are processed in the same way as we process Ops.

It is not necessary to create a function for every Op, as we can obtain them through Op::Get.

This file contains the most commonly used intrinsics or those that have special semantics and need compiler support.

Namespaces

Enumerations

Functions

Detailed Description