Collection of builtin intrinsics as ops. More...

Enumerations
enum	TVMStructFieldKind : int { kDLTensorAddr , kDLTensorData , kDLTensorShape , kDLTensorStrides , kDLTensorNDim , kDLTensorTypeCode , kDLTensorTypeBits , kDLTensorTypeLanes , kDLTensorByteOffset , kDLTensorDeviceId , kDLTensorDeviceType , kDLTensorKindBound_ , kTVMValueContent , kTVMFFIAnyTypeIndex , kTVMFFIAnyZeroPadding , kTVMFFIAnyUnionValue , kTVMValueKindBound_ , kInt64ArrayElem }
	The kind of structure field info used in intrinsic. More...

Functions
const Op &	ret ()
	Return value. More...

const Op &	thread_return ()
	Return from a GPU thread. More...

const Op &	continue_loop ()
	Loop continue. More...

const Op &	break_loop ()
	Loop break. More...

const Op &	reinterpret ()
	Reinterpret the value using the target type. More...

const Op &	likely ()
	Marks a condition is likely going to happen. More...

const Op &	filter ()
	Thread-set filter predicate. Used as the condition of an IfThenElse to narrow the active thread set A for the then-branch. Two forms: filter(var, lo, hi) – range form, true iff var in [lo, hi) filter(var, cond) – predicate form (e.g. var == k); true iff cond `var` must be a ScopeIdDef-declared Var at parse time (Verifier Rule 2). More...

const Op &	selector ()
	Analysis-only active-thread selector. More...

const Op &	bitwise_and ()
	Bitwise and operator. More...

const Op &	bitwise_or ()
	Bitwise or operator. More...

const Op &	bitwise_xor ()
	Bitwise xor operator. More...

const Op &	bitwise_not ()
	Bitwise not operator. More...

const Op &	shift_left ()
	Left shift. More...

const Op &	shift_right ()
	Right shift. More...

const Op &	large_uint_imm ()
	See pesudo code. More...

const Op &	q_multiply_shift ()
	Execute a multiplication between two Q-numbers x and y followed by a right shift s The default rounding rule is to the nearest value, rounding half up (i.e., round(x.1) = x and round (x.5) = x+1) More...

const Op &	q_multiply_shift_per_axis ()

const Op &	address_of ()
	Returns the address of an element in the buffer (see pseudocode below). More...

const Op &	if_then_else ()
	Same as select, used for unsafe memory access. More...

const Op &	isnullptr ()
	See pesudo code. More...

const Op &	isnan ()
	Check if value is nan. More...

const Op &	popcount ()
	Popcount. More...

const Op &	fma ()
	Fused multiply add. More...

const Op &	call_extern ()
	Call an extern C function with given name and signature from the types of args in the runtime environment. More...

const Op &	call_pure_extern ()
	Call an pure extern C function with given name and signature from the types of args in the runtime environment. More...

const Op &	call_llvm_intrin ()
	Call an LLVM intrinsic with a given intrinsic id and signature from the types of args in the runtime environment. More...

const Op &	call_llvm_pure_intrin ()
	Call an LLVM pure intrinsic with a given intrinsic id and signature from the types of args in the runtime environment. More...

const Op &	call_spirv_pure_glsl450 ()
	Call an SPIRV pure GLSL450 intrinsic. More...

const Op &	prefetch ()
	same signature as llvm.prefetch More...

const Op &	tvm_access_ptr ()
	Get head access address with memory access pattern info. More...

const Op &	ptr_byte_offset ()
	Cast a handle to a typed pointer after adding a byte offset. More...

const Op &	tvm_static_handle ()
	Create a function local static handle that iniitalizes to nullptr. can be used to cache function local static resources. More...

const Op &	tvm_context_id ()
	Return a unique context id, used for hint of workspace separation. Different context id ganrantees not having overlapping workspace. More...

const Op &	tvm_tuple ()
	tvm_tuple is not an actual function and cannot codegen. It is used to represent tuple structure in value field of AttrStmt, for the sake of giving hint to optimization. More...

const Op &	handle_add_byte_offset ()
	See pesudo code. More...

const Op &	tvm_struct_get ()
	See pesudo code. More...

const Op &	tvm_struct_set ()
	See pesudo code. More...

const Op &	lookup_param ()
	See pseudo code Type lookup_param(ffi::String param_name) { return __tvm_param__param_name; }. More...

const Op &	tvm_throw_last_error ()
	See pesudo code. More...

const Op &	tvm_stack_alloca ()
	See pesudo code. More...

const Op &	tvm_stack_make_shape ()
	Allocate a shape tuple on stack, return the handle. More...

const Op &	tvm_stack_make_array ()
	Allocate a Tensor(DLTensor) on stack, return the handle. More...

const Op &	tvm_call_packed ()
	See pesudo code. More...

const Op &	tvm_call_cpacked ()
	See pesudo code. More...

const Op &	tvm_call_trace_packed ()
	See pesudo code. More...

const Op &	tvm_thread_invariant ()
	Mark a condition to be thread invariant. This means the condition must be the same for all threads. More...

const Op &	tvm_call_packed_lowered ()
	Lowered version of call packed, the space of value and type codes are explicitly allocated. More...

const Op &	tvm_call_cpacked_lowered ()
	Lowered version of call c-packed, the space of value and type codes are explicitly allocated. More...

const Op &	tvm_call_trace_packed_lowered ()
	Lowered version of trace intrinsic, the space of value and type codes are explicitly allocated. The return value is the (end - 1) value on the stack. More...

const Op &	tvm_storage_sync ()
	See pseudo code. More...

const Op &	tvm_kernel_replace_point ()
	Marker where a transform should replace generated kernel initialization. More...

const Op &	tvm_warp_shuffle ()
	See pseudo code. More...

const Op &	tvm_warp_shuffle_up ()

const Op &	tvm_warp_shuffle_down ()

const Op &	tvm_warp_shuffle_xor ()

const Op &	tvm_warp_activemask ()

const Op &	tvm_global_barrier_kinit ()
	Initialize the global barrier. Call this at beginning of kernel that need global barrier. More...

const Op &	tvm_thread_allreduce ()
	See pesudo code. More...

const Op &	cooperative_tensor_fill ()
	Fill a cooperative_tensor with a given value. More...

const Op &	cooperative_tensor_load ()
	Load data from device or threadgroup memory into a cooperative_tensor. More...

const Op &	cooperative_tensor_store ()
	Store data from a cooperative_tensor to device or threadgroup memory. More...

const Op &	cooperative_tensor_multiply_accumulate ()
	Multiply and accumulate two matrices using cooperative_tensor (MetalPerformancePrimitives matmul2d). More...

const Op &	vectorhigh ()
	Get the high level half of the vector. More...

const Op &	vectorlow ()
	Get the low-level half of the vector. More...

const Op &	vectorcombine ()
	Concat two vectors. More...

const Op &	dp4a ()
	Dot product of two int8x4 vectors and add an optional accumulator. More...

const Op &	atomic_add ()
	atomic add instruction, corresponding e.g. to atomicAdd in CUDA More...

const Op &	nd_mem_alloc_with_scope ()
	Create an Nd memory allocation with storage scope. More...

const Op &	texture2d_store ()
	Store to texture 2d memory. More...

const Op &	texture2d_load ()
	Load from texture 2d memory. More...

const Op &	dma_copy ()
	Initiate a non-blocking DMA copy from source to destination. More...

const Op &	dma_wait ()
	Wait until the number of DMA groups in flight is less than or equal to some maximum. More...

const Op &	dma_start_group ()
	Start a group of DMA copies. More...

const Op &	dma_end_group ()
	End a group of DMA copies. More...

const Op &	assume ()
	Provide a true statement that can be used for simplifications. More...

const Op &	undef ()
	Returns an initialized but arbitrary value. More...

const Op &	start_profile_intrinsic ()
	Profiling intrinsic. More...

const Op &	end_profile_intrinsic ()
	Profiling intrinsic. More...

const Op &	anylist_getitem ()
	Get a item from any list and return it. More...

const Op &	anylist_resetitem ()
	Reset and clear a item in any list. More...

const Op &	anylist_setitem_call_packed ()
	Set an item into any list by running packed function call. More...

const Op &	anylist_setitem_call_cpacked ()
	Same as anylist_setitem_call_packed but use C calling convention. More...

const Op &	vscale ()
	Get the target's vscale value. It will be lowered to llvm.vscale intrinsic (https://llvm.org/docs/LangRef.html#llvm-vscale-intrinsic) More...

const Op &	get_active_lane_mask ()
	Calculate a predicate mask given an upper bound (limit) and a current value (base). More...

const Op &	ignore_loop_partition ()
	Annotate a predicate not be considered as target condition of loop partition. More...

const Op &	buffer_offset ()
	Get the element offset of a buffer given logical indices. More...

const Op &	print_buffer ()
	Print the content of a buffer during runtime. More...

Detailed Description

Collection of builtin intrinsics as ops.

Enumeration Type Documentation

◆ TVMStructFieldKind

enum tvm::tirx::builtin::TVMStructFieldKind : int

The kind of structure field info used in intrinsic.

Enumerator
kDLTensorAddr
kDLTensorData
kDLTensorShape
kDLTensorStrides
kDLTensorNDim
kDLTensorTypeCode
kDLTensorTypeBits
kDLTensorTypeLanes
kDLTensorByteOffset
kDLTensorDeviceId
kDLTensorDeviceType
kDLTensorKindBound_
kTVMValueContent
kTVMFFIAnyTypeIndex
kTVMFFIAnyZeroPadding
kTVMFFIAnyUnionValue
kTVMValueKindBound_
kInt64ArrayElem

Function Documentation

◆ address_of()

const Op& tvm::tirx::builtin::address_of ( )

Returns the address of an element in the buffer (see pseudocode below).

The number of indices should match the dimensionality of the buffer being accessed. If this operation occurs after buffer flattening, the number of indices must be supported by the target (i.e. N>1 only on targets that support non-flat memory buffers).

Handle address_of(BufferLoad *op) { return &op->buffer_var[op->indices[0], op->indices[1], ..., op->indices[N-1]]; }

◆ anylist_getitem()

const Op& tvm::tirx::builtin::anylist_getitem ( )

Get a item from any list and return it.

Any anylist_getitem(Handle anylist, int index) return anylist[index]; }

Note: This intrinsic is only applicable when appearing in call_packed and anylist_setitem_call_packed.

◆ anylist_resetitem()

const Op& tvm::tirx::builtin::anylist_resetitem ( )

Reset and clear a item in any list.

void anylist_resetitem(Handle anylist, int index) anylist[index] = nullptr; }

Note: This intrinsic is only applicable when appearing in call_packed and anylist_setitem_call_packed.

◆ anylist_setitem_call_cpacked()

const Op& tvm::tirx::builtin::anylist_setitem_call_cpacked ( )

Same as anylist_setitem_call_packed but use C calling convention.

◆ anylist_setitem_call_packed()

const Op& tvm::tirx::builtin::anylist_setitem_call_packed ( )

Set an item into any list by running packed function call.

void anylist_setitem_call_packed(Handle anylist, int index, name, *args)

anylist[index] = call_packed(name, *args) }

Note: This intrinsic can be used in combination with anylist_getitem.

◆ assume()

const Op& tvm::tirx::builtin::assume ( )

Provide a true statement that can be used for simplifications.

Compile-time representation of known constraints about function inputs. This assumption is removed when lowering, and does not occur in codegen.

◆ atomic_add()

const Op& tvm::tirx::builtin::atomic_add ( )

atomic add instruction, corresponding e.g. to atomicAdd in CUDA

◆ bitwise_and()

const Op& tvm::tirx::builtin::bitwise_and ( )

Bitwise and operator.

◆ bitwise_not()

const Op& tvm::tirx::builtin::bitwise_not ( )

Bitwise not operator.

◆ bitwise_or()

const Op& tvm::tirx::builtin::bitwise_or ( )

Bitwise or operator.

◆ bitwise_xor()

const Op& tvm::tirx::builtin::bitwise_xor ( )

Bitwise xor operator.

◆ break_loop()

const Op& tvm::tirx::builtin::break_loop ( )

Loop break.

◆ buffer_offset()

const Op& tvm::tirx::builtin::buffer_offset ( )

Get the element offset of a buffer given logical indices.

The offset is determined by the layout of the buffer.

◆ call_extern()

const Op& tvm::tirx::builtin::call_extern ( )

Call an extern C function with given name and signature from the types of args in the runtime environment.

Type call_extern(name, args...) { return dlsym(name)(args...); }

Note: This intrinsic does not provide any type checking, and is main used for backward compatibility reasons. Always consider use pre-registered and typed tvm::Op first.

◆ call_llvm_intrin()

const Op& tvm::tirx::builtin::call_llvm_intrin ( )

Call an LLVM intrinsic with a given intrinsic id and signature from the types of args in the runtime environment.

Type call_llvm_pure_intrin(intrin_id, args...) { return dlsym(name)(args...); }

Note: This op does not provide any type checking.

◆ call_llvm_pure_intrin()

const Op& tvm::tirx::builtin::call_llvm_pure_intrin ( )

Call an LLVM pure intrinsic with a given intrinsic id and signature from the types of args in the runtime environment.

Type call_llvm_pure_intrin(intrin_id, args...) { return dlsym(name)(args...); }

Note: This op does not provide any type checking.

◆ call_pure_extern()

const Op& tvm::tirx::builtin::call_pure_extern ( )

Call an pure extern C function with given name and signature from the types of args in the runtime environment.

Type call_pure_extern(name, args...) { return dlsym(name)(args...); }

Note: This intrinsic does not provide any type checking, and is main used for backward compatibility reasons. Always consider use pre-registered and typed tvm::Op first.

◆ call_spirv_pure_glsl450()

const Op& tvm::tirx::builtin::call_spirv_pure_glsl450 ( )

Call an SPIRV pure GLSL450 intrinsic.

Type call_spirv_pure_glsl450(intrin_id, args...) { return dlsym(name)(args...); }

Note: This op does not provide any type checking.

◆ continue_loop()

const Op& tvm::tirx::builtin::continue_loop ( )

Loop continue.

◆ cooperative_tensor_fill()

const Op& tvm::tirx::builtin::cooperative_tensor_fill ( )

Fill a cooperative_tensor with a given value.

void cooperative_tensor_fill(Var d, PrimExpr index, PrimExpr value, int rows, int cols);

◆ cooperative_tensor_load()

const Op& tvm::tirx::builtin::cooperative_tensor_load ( )

Load data from device or threadgroup memory into a cooperative_tensor.

void cooperative_tensor_load(Var d, PrimExpr index, PrimExpr ptr, PrimExpr stride, int rows, int cols, bool transpose_matrix, int mma_M, int mma_N, int mma_K, int operand_role); operand_role: 0=left(A), 1=right(B), 2=destination(C)

◆ cooperative_tensor_multiply_accumulate()

const Op& tvm::tirx::builtin::cooperative_tensor_multiply_accumulate ( )

Multiply and accumulate two matrices using cooperative_tensor (MetalPerformancePrimitives matmul2d).

void cooperative_tensor_multiply_accumulate( Var d, PrimExpr index_d, Var a, PrimExpr index_a, Var b, PrimExpr index_b, Var c, PrimExpr index_c, int M, int N, int K, bool transpose_a, bool transpose_b);

◆ cooperative_tensor_store()

const Op& tvm::tirx::builtin::cooperative_tensor_store ( )

Store data from a cooperative_tensor to device or threadgroup memory.

void cooperative_tensor_store(Var d, PrimExpr index, PrimExpr ptr, PrimExpr stride, int rows, int cols, bool transpose_matrix, int mma_M, int mma_N, int mma_K, int operand_role); operand_role: 0=left(A), 1=right(B), 2=destination(C)

◆ dma_copy()

const Op& tvm::tirx::builtin::dma_copy ( )

Initiate a non-blocking DMA copy from source to destination.

The copy is launched immediately.

If a dma_start_group() call is active, the copy will be added to the current group for tracking of in-flight group counts.

If no dma_start_group() call is active, the copy will be tracked individually i.e. as a group with size 1.

◆ dma_end_group()

const Op& tvm::tirx::builtin::dma_end_group ( )

End a group of DMA copies.

Track all calls to dma_copy() that occurred since the preceding dma_start_group() as a single group in-flight.

Calling dma_end_group() without an active group is unsupported.

Note: A group of DMA calls may be empty, and will still contribute to the count of in-flight groups used by dma_wait().

◆ dma_start_group()

const Op& tvm::tirx::builtin::dma_start_group ( )

Start a group of DMA copies.

Any call to dma_copy() that occurs after dma_start_group() will be added to the current group for tracking of in-flight group counts.

Only one DMA group may be active at a given time. Calling dma_start_group() while a group is active is unsupported.

◆ dma_wait()

const Op& tvm::tirx::builtin::dma_wait ( )

Wait until the number of DMA groups in flight is less than or equal to some maximum.

Calling dma_wait() while a group is active is unsupported.

◆ dp4a()

const Op& tvm::tirx::builtin::dp4a ( )

Dot product of two int8x4 vectors and add an optional accumulator.

◆ end_profile_intrinsic()

const Op& tvm::tirx::builtin::end_profile_intrinsic ( )

Profiling intrinsic.

◆ filter()

const Op& tvm::tirx::builtin::filter ( )

Thread-set filter predicate. Used as the condition of an IfThenElse to narrow the active thread set A for the then-branch. Two forms: filter(var, lo, hi) – range form, true iff var in [lo, hi) filter(var, cond) – predicate form (e.g. var == k); true iff cond var must be a ScopeIdDef-declared Var at parse time (Verifier Rule 2).

◆ fma()

const Op& tvm::tirx::builtin::fma ( )

Fused multiply add.

Type fma(a, b, c) { return a * b + c; }

◆ get_active_lane_mask()

const Op& tvm::tirx::builtin::get_active_lane_mask ( )

Calculate a predicate mask given an upper bound (limit) and a current value (base).

It will be lowered to the llvm.get.active.lane.mask intrinsic. (https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics)

◆ handle_add_byte_offset()

const Op& tvm::tirx::builtin::handle_add_byte_offset ( )

See pesudo code.

void* handle_add_byte_offset(void* handle, int offset) { return reinterpret_cast<v*>(reinterpret_cast<char*>(handle) + offset); }

◆ if_then_else()

const Op& tvm::tirx::builtin::if_then_else ( )

Same as select, used for unsafe memory access.

Type tvm_if_then_else(cond, a, b) { return cond ? a : b; }

◆ ignore_loop_partition()

const Op& tvm::tirx::builtin::ignore_loop_partition ( )

Annotate a predicate not be considered as target condition of loop partition.

◆ isnan()

const Op& tvm::tirx::builtin::isnan ( )

Check if value is nan.

◆ isnullptr()

const Op& tvm::tirx::builtin::isnullptr ( )

See pesudo code.

bool isnullptr(void* handle) { return handle == nullptr }

◆ large_uint_imm()

const Op& tvm::tirx::builtin::large_uint_imm ( )

See pesudo code.

Construct a big uint that may not be representable by int64

Expr large_uint_imm(uint32_t v0, uin32_t v1) { return (v1 << 32) | v0; }

◆ likely()

const Op& tvm::tirx::builtin::likely ( )

Marks a condition is likely going to happen.

◆ lookup_param()

const Op& tvm::tirx::builtin::lookup_param ( )

See pseudo code Type lookup_param(ffi::String param_name) { return __tvm_param__param_name; }.

◆ nd_mem_alloc_with_scope()

const Op& tvm::tirx::builtin::nd_mem_alloc_with_scope ( )

Create an Nd memory allocation with storage scope.

◆ popcount()

const Op& tvm::tirx::builtin::popcount ( )

Popcount.

◆ prefetch()

const Op& tvm::tirx::builtin::prefetch ( )

same signature as llvm.prefetch

◆ print_buffer()

const Op& tvm::tirx::builtin::print_buffer ( )

Print the content of a buffer during runtime.

◆ ptr_byte_offset()

const Op& tvm::tirx::builtin::ptr_byte_offset ( )

Cast a handle to a typed pointer after adding a byte offset.

DType* ptr_byte_offset(void* data, int byte_offset, Expr dtype) { return reinterpret_cast<DType*>(reinterpret_cast<char*>(data) + byte_offset); }

◆ q_multiply_shift()

const Op& tvm::tirx::builtin::q_multiply_shift ( )

Execute a multiplication between two Q-numbers x and y followed by a right shift s The default rounding rule is to the nearest value, rounding half up (i.e., round(x.1) = x and round (x.5) = x+1)

◆ q_multiply_shift_per_axis()

const Op& tvm::tirx::builtin::q_multiply_shift_per_axis ( )

◆ reinterpret()

const Op& tvm::tirx::builtin::reinterpret ( )

Reinterpret the value using the target type.

◆ ret()

const Op& tvm::tirx::builtin::ret ( )

Return value.

◆ selector()

const Op& tvm::tirx::builtin::selector ( )

Analysis-only active-thread selector.

selector(var, pred) denotes the unique value of var in the current active domain for which pred is true. It is used only inside ExecContext/DispatchContext metadata, for predicates such as ptx.elect_sync() whose selected lane cannot be inferred structurally.

◆ shift_left()

const Op& tvm::tirx::builtin::shift_left ( )

Left shift.

◆ shift_right()

const Op& tvm::tirx::builtin::shift_right ( )

Right shift.

◆ start_profile_intrinsic()

const Op& tvm::tirx::builtin::start_profile_intrinsic ( )

Profiling intrinsic.

◆ texture2d_load()

const Op& tvm::tirx::builtin::texture2d_load ( )

Load from texture 2d memory.

◆ texture2d_store()

const Op& tvm::tirx::builtin::texture2d_store ( )

Store to texture 2d memory.

◆ thread_return()

const Op& tvm::tirx::builtin::thread_return ( )

Return from a GPU thread.

◆ tvm_access_ptr()

const Op& tvm::tirx::builtin::tvm_access_ptr ( )

Get head access address with memory access pattern info.

This operator also marks range of the memory access The offset and extent are in unit of the DType(including vectorization factor). rw_mask is a bit_mask setting whether the access is a read(1) or write(2). The access is assume to happen in the current expression.

PtrType tvm_access_ptr(Expr dtype, DType* data, int offset, int extent, int rw_mask) { // DType == dtype.type(); return &data[offset]; }

◆ tvm_call_cpacked()

const Op& tvm::tirx::builtin::tvm_call_cpacked ( )

See pesudo code.

return_type tvm_call_packed(fname, TVMFFIAny* args) { TVMFFIAny result; (*fname)(args, args, len(args), &result); return cast(return_type, result); }

◆ tvm_call_cpacked_lowered()

const Op& tvm::tirx::builtin::tvm_call_cpacked_lowered ( )

Lowered version of call c-packed, the space of value and type codes are explicitly allocated.

int tvm_call_packed_lowered(fname, TVMFFIAny* args_stack, int begin, int end, void* self) { fname(ffi::PackedArgs(value_stack[begin:end], tcode_stack[begin:end]), ffi::Any(value_stack + end, tcode_stack + end)); }

◆ tvm_call_packed()

const Op& tvm::tirx::builtin::tvm_call_packed ( )

See pesudo code.

return_type tvm_call_packed(name, TVMFFIAny* args) { TVMFFIAny result; ModuleNode* env = GetCurrentEnv(); const ffi::Function* f = env->GetFuncFromEnv(name); (*f)(args, args, len(args), &result); // return type can be int, float, handle. return cast(return_type, result); }

◆ tvm_call_packed_lowered()

const Op& tvm::tirx::builtin::tvm_call_packed_lowered ( )

Lowered version of call packed, the space of value and type codes are explicitly allocated.

return_type tvm_call_packed_lowered(name, TVMFFIAny* args_stack, int begin, int end) { ModuleNode* env = GetCurrentEnv(); const ffi::Function* f = env->GetFuncFromEnv(name); f->CallPacked(ffi::PackedArgs(args_stack[begin:end]), ffi::Any(args_stack + end)); // return type can be int, float, handle. return cast(return_type, load_return_from(args_stack + end)) }

◆ tvm_call_trace_packed()

const Op& tvm::tirx::builtin::tvm_call_trace_packed ( )

See pesudo code.

return_type tvm_call_trace_packed(name, TVMFFIAny* args) { ModuleNode* env = GetCurrentEnv(); const ffi::Function* f = env->GetFuncFromEnv(name); (*f)(args, args, len(args)); // return type can be int, float, handle. return cast(return_type, result); }

◆ tvm_call_trace_packed_lowered()

const Op& tvm::tirx::builtin::tvm_call_trace_packed_lowered ( )

Lowered version of trace intrinsic, the space of value and type codes are explicitly allocated. The return value is the (end - 1) value on the stack.

return_type tvm_call_trace_packed_lowered(name, TVMFFIAny* args_stack, int begin, int end) { ModuleNode* env = GetCurrentEnv(); const ffi::Function* f = env->GetFuncFromEnv(name); f->CallPacked(ffi::PackedArgs(args_stack[begin:end]), ffi::Any(args_stack + end)); // return type can be int, float, handle. return cast(return_type, load_return_from(args_stack + end)) }

◆ tvm_context_id()

const Op& tvm::tirx::builtin::tvm_context_id ( )

Return a unique context id, used for hint of workspace separation. Different context id ganrantees not having overlapping workspace.

◆ tvm_global_barrier_kinit()

const Op& tvm::tirx::builtin::tvm_global_barrier_kinit ( )

Initialize the global barrier. Call this at beginning of kernel that need global barrier.

◆ tvm_kernel_replace_point()

const Op& tvm::tirx::builtin::tvm_kernel_replace_point ( )

Marker where a transform should replace generated kernel initialization.

◆ tvm_stack_alloca()

const Op& tvm::tirx::builtin::tvm_stack_alloca ( )

See pesudo code.

dtype in {shape, array, arg_value, arg_tcode}

Handle tvm_stack_alloca(string dtype, int num) { return new on stack dtype[num]; }

◆ tvm_stack_make_array()

const Op& tvm::tirx::builtin::tvm_stack_make_array ( )

Allocate a Tensor(DLTensor) on stack, return the handle.

Type tvm_stack_make_array(Expr data, Expr shape, Expr strides, Expr ndim, Expr dtype, Expr elem_offset) { ret = alloca stack DLTensor(); ret->data = data; ret->shape = shape; ret->strides = strides != 0 ? strides : nullptr; ret->ndim = ndim; ret->dtype = dtype.type(); ret->byte_offset = elem_offset * sizeof(dtype); return ret; }

◆ tvm_stack_make_shape()

const Op& tvm::tirx::builtin::tvm_stack_make_shape ( )

Allocate a shape tuple on stack, return the handle.

Handle tvm_stack_make_shape(list args) { ret = alloca stack int64_t[len(args)]; for i in range(len(args)): ret[i] = args[i] return &ret[0]; }

◆ tvm_static_handle()

const Op& tvm::tirx::builtin::tvm_static_handle ( )

Create a function local static handle that iniitalizes to nullptr. can be used to cache function local static resources.

◆ tvm_storage_sync()

const Op& tvm::tirx::builtin::tvm_storage_sync ( )

See pseudo code.

int tvm_storage_sync(std::string storage_scope) { __sync(storage_scope); return 0; }

◆ tvm_struct_get()

const Op& tvm::tirx::builtin::tvm_struct_get ( )

See pesudo code.

Type tvm_struct_get(StructType* arr, int index, int field_id) { return arr[index]->field; }

See also: TVMStructFieldKind

◆ tvm_struct_set()

const Op& tvm::tirx::builtin::tvm_struct_set ( )

See pesudo code.

Handle tvm_struct_set(StructType* arr, int index, int field_id, value) { arr[index]->field = value; }

See also: TVMStructFieldKind

◆ tvm_thread_allreduce()

const Op& tvm::tirx::builtin::tvm_thread_allreduce ( )

See pesudo code.

void tvm_thread_allreduce(UIntImm size, Expr source0, ..., Expr cond, Var reduce_temp0, .., Var thread_idx1, ...) { // constraint by the other thread_idx remain the same. // reduce_temp is used to save intermediate result. reduce_temp0, ... = reduce(combiner, source0, ..., cond over [thread_idx1, thread_idx2] passed by any caller) }

◆ tvm_thread_invariant()

const Op& tvm::tirx::builtin::tvm_thread_invariant ( )

Mark a condition to be thread invariant. This means the condition must be the same for all threads.

◆ tvm_throw_last_error()

const Op& tvm::tirx::builtin::tvm_throw_last_error ( )

See pesudo code.

void tvm_throw_last_error() { throw TVMGetLastError(); }

◆ tvm_tuple()

const Op& tvm::tirx::builtin::tvm_tuple ( )

tvm_tuple is not an actual function and cannot codegen. It is used to represent tuple structure in value field of AttrStmt, for the sake of giving hint to optimization.

void tvm_tuple(value0, value1, ..., value_n);

◆ tvm_warp_activemask()

const Op& tvm::tirx::builtin::tvm_warp_activemask ( )

◆ tvm_warp_shuffle()

const Op& tvm::tirx::builtin::tvm_warp_shuffle ( )

See pseudo code.

Type tvm_warp_shuffle(mask, Type value, warp_id, width, warp_size) { return (value passed in by warp indicated by this_warp_id); }

Type tvm_warp_shuffle_up(mask, Type value, offset, width, warp_size) { return (value passed in by warp indicated by this_warp_id - offset); }

Type tvm_warp_shuffle_down(mask, Type value, offset, width, warp_size) { return (value passed in by warp indicated by this_warp_id + offset); }

unsigned tvm_warp_activemask() { return (32-bit mask of currently active threads in the calling warp); }

Parameter warp_id indicates the source thread ID in a warp.

Parameter offset indicates the relative distance to this_warp_id.

Parameter width indicates the number of threads involved in one shuffle. See CUDA document for __shfl_sync, __shfl_up_sync, __shfl_down_sync, __shfl_xor_sync and __activemask.

Parameter warp_size is the size of a warp, which helps a backend to determine whether the width parameter is legal.

◆ tvm_warp_shuffle_down()

const Op& tvm::tirx::builtin::tvm_warp_shuffle_down ( )

◆ tvm_warp_shuffle_up()

const Op& tvm::tirx::builtin::tvm_warp_shuffle_up ( )

◆ tvm_warp_shuffle_xor()

const Op& tvm::tirx::builtin::tvm_warp_shuffle_xor ( )

◆ undef()

const Op& tvm::tirx::builtin::undef ( )

Returns an initialized but arbitrary value.

Compile-time representation of memory locations whose values may be altered as a result of optimizations.

◆ vectorcombine()

const Op& tvm::tirx::builtin::vectorcombine ( )

Concat two vectors.

◆ vectorhigh()

const Op& tvm::tirx::builtin::vectorhigh ( )

Get the high level half of the vector.

◆ vectorlow()

const Op& tvm::tirx::builtin::vectorlow ( )

Get the low-level half of the vector.

◆ vscale()

const Op& tvm::tirx::builtin::vscale ( )

Get the target's vscale value. It will be lowered to llvm.vscale intrinsic (https://llvm.org/docs/LangRef.html#llvm-vscale-intrinsic)

Enumerations

Functions

Detailed Description

Enumeration Type Documentation

◆ TVMStructFieldKind

Function Documentation

◆ address_of()

◆ anylist_getitem()

◆ anylist_resetitem()

◆ anylist_setitem_call_cpacked()

◆ anylist_setitem_call_packed()

◆ assume()

◆ atomic_add()

◆ bitwise_and()

◆ bitwise_not()

◆ bitwise_or()

◆ bitwise_xor()

◆ break_loop()

◆ buffer_offset()

◆ call_extern()

◆ call_llvm_intrin()

◆ call_llvm_pure_intrin()

◆ call_pure_extern()

◆ call_spirv_pure_glsl450()

◆ continue_loop()

◆ cooperative_tensor_fill()

◆ cooperative_tensor_load()

◆ cooperative_tensor_multiply_accumulate()

◆ cooperative_tensor_store()

◆ dma_copy()

◆ dma_end_group()

◆ dma_start_group()

◆ dma_wait()

◆ dp4a()

◆ end_profile_intrinsic()

◆ filter()

◆ fma()

◆ get_active_lane_mask()

◆ handle_add_byte_offset()

◆ if_then_else()

◆ ignore_loop_partition()

◆ isnan()

◆ isnullptr()

◆ large_uint_imm()

◆ likely()

◆ lookup_param()

◆ nd_mem_alloc_with_scope()

◆ popcount()

◆ prefetch()

◆ print_buffer()

◆ ptr_byte_offset()

◆ q_multiply_shift()

◆ q_multiply_shift_per_axis()

◆ reinterpret()

◆ ret()

◆ selector()

◆ shift_left()

◆ shift_right()

◆ start_profile_intrinsic()

◆ texture2d_load()

◆ texture2d_store()

◆ thread_return()

◆ tvm_access_ptr()

◆ tvm_call_cpacked()

◆ tvm_call_cpacked_lowered()

◆ tvm_call_packed()

◆ tvm_call_packed_lowered()

◆ tvm_call_trace_packed()

◆ tvm_call_trace_packed_lowered()

◆ tvm_context_id()

◆ tvm_global_barrier_kinit()

◆ tvm_kernel_replace_point()

◆ tvm_stack_alloca()

◆ tvm_stack_make_array()

◆ tvm_stack_make_shape()

◆ tvm_static_handle()

◆ tvm_storage_sync()

◆ tvm_struct_get()

◆ tvm_struct_set()

◆ tvm_thread_allreduce()