tvm
Namespaces | Classes | Enumerations | Functions | Variables
tvm::runtime Namespace Reference

Namespaces

 cuda_ipc
 
 memory
 
 symbol
 namespace for constant symbols
 
 vm
 

Classes

class  DataType
 Runtime primitive data type. More...
 
class  DeviceAPI
 TVM Runtime Device API, abstracts the device specific interface for memory management. More...
 
class  DiscoWorker
 A worker in Disco. It takes a channel to communication with the controler. The worker can be run in a separate thread or process as long as the channel supports bi-directional communication in-between. More...
 
struct  ThreadLocalDiscoWorker
 A threadlocal wrapper of DiscoWorker. More...
 
class  DRefObj
 An object that exists on all workers. More...
 
class  DRef
 Managed reference to DRefObj. More...
 
class  SessionObj
 A Disco interactive session. It allows users to interact with the Disco command queue with various ffi::Function calling convention. More...
 
class  Session
 Managed reference to SessionObj. More...
 
class  DiscoChannel
 A bi-directional channel for controler-worker communication. This channel is primarily used to transfer control messages but not data. More...
 
class  WorkerZeroData
 A special communication channel between controler and worker-0, assuming they are always collocated in the same process. More...
 
class  NVTXScopedRange
 A class to create a NVTX range. No-op if TVM is not built against NVTX. More...
 
class  Tensor
 Managed Tensor. The array is backed by reference counted blocks. More...
 
class  TimerNode
 Base class for all timer implementations. More...
 
class  Timer
 Timer for a specific device. More...
 

Enumerations

enum  TVMDeviceExtType { TVMDeviceExtType_End = 36 }
 Extension device types in TVM. More...
 
enum  DeviceAttrKind : int {
  kExist = 0 , kMaxThreadsPerBlock = 1 , kWarpSize = 2 , kMaxSharedMemoryPerBlock = 3 ,
  kComputeVersion = 4 , kDeviceName = 5 , kMaxClockRate = 6 , kMultiProcessorCount = 7 ,
  kMaxThreadDimensions = 8 , kMaxRegistersPerBlock = 9 , kGcnArch = 10 , kApiVersion = 11 ,
  kDriverVersion = 12 , kL2CacheSizeBytes = 13 , kTotalGlobalMemory = 14 , kAvailableGlobalMemory = 15 ,
  kImagePitchAlignment = 16
}
 the query type into GetAttr More...
 
enum class  ReduceKind : int32_t {
  kSum = 0 , kProd = 1 , kMin = 2 , kMax = 3 ,
  kAvg = 4
}
 Possible kinds of reduction operations. More...
 
enum class  DiscoAction : int32_t {
  kShutDown = 0 , kKillReg = 1 , kGetGlobalFunc = 2 , kCallPacked = 3 ,
  kSyncWorker = 4 , kCopyFromWorker0 = 5 , kCopyToWorker0 = 6 , kDebugGetFromRemote = 7 ,
  kDebugSetRegister = 8
}
 All possible kinds of Disco commands. More...
 

Functions

int GetVectorBytes (DataType dtype)
 Get the number of bytes needed in a vector. More...
 
bool TypeMatch (DLDataType t, int code, int bits, int lanes=1)
 Check whether type matches the given spec. More...
 
bool TypeEqual (DLDataType lhs, DLDataType rhs)
 Check whether two types are equal . More...
 
std::ostream & operator<< (std::ostream &os, const DataType &dtype)
 
const char * DLDeviceType2Str (int type)
 The name of DLDeviceType. More...
 
bool IsRPCSessionDevice (Device dev)
 Return true if a Device is owned by an RPC session. More...
 
int GetRPCSessionIndex (Device dev)
 Return the RPCSessTable index of the RPC Session that owns this device. More...
 
Device RemoveRPCSessionMask (Device dev)
 Remove the RPC session mask from a Device. RPC clients typically do this when encoding a Device for transmission to an RPC remote. On the wire, RPCdevice are expected to be valid on the server without interpretation. More...
 
std::ostream & operator<< (std::ostream &os, DLDevice dev)
 
Device AddRPCSessionMask (Device dev, int session_table_index)
 Add a RPC session mask to a Device. RPC clients typically do this when decoding a Device received from a RPC remote. More...
 
TVM_RUNTIME_DLL bool RuntimeEnabled (const ffi::String &target)
 Check if runtime module is enabled for target. More...
 
std::string ReduceKind2String (ReduceKind kind)
 Converts ReduceKind to string. More...
 
TVM_RUNTIME_DLL ffi::Module LoadVMModule (std::string path, ffi::Optional< Device > device)
 Load a runtime Module, then create and initialize a RelaxVM. More...
 
TVM_RUNTIME_DLL Tensor DiscoEmptyTensor (ffi::Shape shape, DataType dtype, ffi::Optional< Device > device)
 Create an uninitialized empty Tensor. More...
 
TVM_RUNTIME_DLL void AllReduce (Tensor send, ReduceKind reduce_kind, bool in_group, Tensor recv)
 Perform an allreduce operation using the underlying communication library. More...
 
TVM_RUNTIME_DLL void AllGather (Tensor send, bool in_group, Tensor recv)
 Perform an allgather operation using the underlying communication library. More...
 
TVM_RUNTIME_DLL void BroadcastFromWorker0 (Tensor send, bool in_group, Tensor recv)
 Perform a broadcast operation from worker-0. More...
 
TVM_RUNTIME_DLL void ScatterFromWorker0 (ffi::Optional< Tensor > send, bool in_group, Tensor recv)
 Perform a scatter operation from worker-0, chunking the given buffer into equal parts. More...
 
TVM_RUNTIME_DLL void GatherToWorker0 (Tensor send, bool in_group, ffi::Optional< Tensor > recv)
 Perform a gather operation to worker-0. More...
 
TVM_RUNTIME_DLL void RecvFromWorker0 (Tensor buffer)
 Receive a buffer from worker-0. No-op if the current worker is worker-0. More...
 
TVM_RUNTIME_DLL void SendToNextGroup (Tensor buffer)
 Send a buffer to the corresponding worker in the next group. An error is thrown if the worker is already in the last group. More...
 
TVM_RUNTIME_DLL void RecvFromPrevGroup (Tensor buffer)
 Receive a buffer from the corresponding worker in the previous group. An error is thrown if the worker is already in the first group. More...
 
TVM_RUNTIME_DLL void SendToWorker (Tensor buffer, int receiver_id)
 Send a buffer to the target receiver worker (globally across all groups). More...
 
TVM_RUNTIME_DLL void RecvFromWorker (Tensor buffer, int sender_id)
 Receive a buffer from the target sender worker (globally across all groups). More...
 
TVM_RUNTIME_DLL int WorkerId ()
 Get the local worker id. More...
 
TVM_RUNTIME_DLL void SyncWorker ()
 Called by the worker thread. Waiting until the worker completes all its tasks. As a specific example, on a CUDA worker, it blocks until all kernels are launched and cudaStreamSynchronize is complete. More...
 
std::string DiscoAction2String (DiscoAction action)
 Converts the enum class DiscoAction to string. More...
 
bool SaveDLTensor (support::Stream *strm, const DLTensor *tensor)
 Save a DLTensor to stream. More...
 
Device GetPreferredHostDevice (Device device)
 Get the preferred host device from the input device. More...
 
ffi::Function WrapTimeEvaluator (ffi::Function f, Device dev, int number, int repeat, int min_repeat_ms, int limit_zero_time_iterations, int cooldown_interval_ms, int repeats_to_cooldown, int cache_flush_bytes=0, ffi::Function f_preproc=nullptr)
 Wrap a timer function to measure the time cost of a given packed function. More...
 

Variables

constexpr int kAllocAlignment = 64
 Number of bytes each allocation must align to. More...
 
constexpr int kTempAllocaAlignment = 64
 Number of bytes each allocation must align to in temporary allocation. More...
 
constexpr int kMaxStackAlloca = 1024
 Maximum size that can be allocated on stack. More...
 
constexpr int kDefaultWorkspaceAlignment = 1
 Number of bytes each allocation must align to by default in the workspace buffer to service intermediate tensors. More...
 
constexpr int kRPCSessMask = 128
 The device type bigger than this is RPC device. More...
 
constexpr int32_t kRuntimeDiscoDRef = TVMFFITypeIndex::kTVMFFIDynObjectBegin - 14
 Static FFI type index for runtime::disco::DRef. More...
 
constexpr uint64_t kTVMTensorMagic = 0xDD5E40F096B4A13F
 Magic number for Tensor file. More...
 

Enumeration Type Documentation

◆ DeviceAttrKind

the query type into GetAttr

Enumerator
kExist 
kMaxThreadsPerBlock 
kWarpSize 
kMaxSharedMemoryPerBlock 
kComputeVersion 
kDeviceName 
kMaxClockRate 
kMultiProcessorCount 
kMaxThreadDimensions 
kMaxRegistersPerBlock 
kGcnArch 
kApiVersion 
kDriverVersion 
kL2CacheSizeBytes 
kTotalGlobalMemory 
kAvailableGlobalMemory 
kImagePitchAlignment 

◆ DiscoAction

enum tvm::runtime::DiscoAction : int32_t
strong

All possible kinds of Disco commands.

Enumerator
kShutDown 
kKillReg 
kGetGlobalFunc 
kCallPacked 
kSyncWorker 
kCopyFromWorker0 
kCopyToWorker0 
kDebugGetFromRemote 
kDebugSetRegister 

◆ ReduceKind

enum tvm::runtime::ReduceKind : int32_t
strong

Possible kinds of reduction operations.

Enumerator
kSum 
kProd 
kMin 
kMax 
kAvg 

◆ TVMDeviceExtType

Extension device types in TVM.

Additional enumerators to supplement those provided by DLPack's DLDeviceType enumeration.

MAINTAINERS NOTE #1: We need to ensure that the two devices are identified by the same integer. Currently this requires manual verification. Discussed here: https://github.com/dmlc/dlpack/issues/111 As of DLPack v0.7, the highest-valued enumerator in DLDeviceType is kDLHexagon = 16.

MAINTAINERS NOTE #2: As of DLPack v0.7, the definition for DLDeviceType specifies an underlying storage type of int32_t. That guarantees a variable of type DLDeviceType is capable of holding any integers provided by either of these enumerations.

However, the int32_t specification only applies when the header file is compiled as C++, and this header file is also meant to work as C code. So the unspecified storage type could be a latent bug when compiled as C.

Enumerator
TVMDeviceExtType_End 

Function Documentation

◆ AddRPCSessionMask()

Device tvm::runtime::AddRPCSessionMask ( Device  dev,
int  session_table_index 
)
inline

Add a RPC session mask to a Device. RPC clients typically do this when decoding a Device received from a RPC remote.

Parameters
devA Device without any RPC Session mask, valid on the RPC server.
session_table_indexNumeric index of the RPC session in the session table.
Returns
A Device with RPC session mask added, valid on the RPC client.

◆ AllGather()

TVM_RUNTIME_DLL void tvm::runtime::AllGather ( Tensor  send,
bool  in_group,
Tensor  recv 
)

Perform an allgather operation using the underlying communication library.

Parameters
sendThe array send to perform allgather on
in_groupWhether the allgather operation performs globally or in group as default.
recvThe array receives the outcome of allgather

◆ AllReduce()

TVM_RUNTIME_DLL void tvm::runtime::AllReduce ( Tensor  send,
ReduceKind  reduce_kind,
bool  in_group,
Tensor  recv 
)

Perform an allreduce operation using the underlying communication library.

Parameters
sendThe array send to perform allreduce on
reduce_kindThe kind of reduction operation (e.g. sum, avg, min, max)
in_groupWhether the allreduce operation performs globally or in group as default.
recvThe array receives the outcome of allreduce

◆ BroadcastFromWorker0()

TVM_RUNTIME_DLL void tvm::runtime::BroadcastFromWorker0 ( Tensor  send,
bool  in_group,
Tensor  recv 
)

Perform a broadcast operation from worker-0.

Parameters
sendThe buffer to be broadcasted
in_groupWhether the broadcast operation performs globally or in group as default.
recvThe buffer receives the broadcasted array

◆ DiscoAction2String()

std::string tvm::runtime::DiscoAction2String ( DiscoAction  action)
inline

Converts the enum class DiscoAction to string.

◆ DiscoEmptyTensor()

TVM_RUNTIME_DLL Tensor tvm::runtime::DiscoEmptyTensor ( ffi::Shape  shape,
DataType  dtype,
ffi::Optional< Device device 
)

Create an uninitialized empty Tensor.

Parameters
shapeThe shape of the Tensor
dtypeThe dtype of the Tensor
deviceThe device the Tensor is created on. If None, use the thread local default device
Returns
The Tensor created

◆ DLDeviceType2Str()

const char* tvm::runtime::DLDeviceType2Str ( int  type)
inline

The name of DLDeviceType.

Parameters
typeThe device type.
Returns
the device name.

◆ GatherToWorker0()

TVM_RUNTIME_DLL void tvm::runtime::GatherToWorker0 ( Tensor  send,
bool  in_group,
ffi::Optional< Tensor recv 
)

Perform a gather operation to worker-0.

Parameters
sendThe sending buffer, which must not be None.
in_groupWhether the gather operation performs globally or in group as default.
recvFor worker-0, it must be provided, and otherwise, the buffer must be None. The receiving buffer will be divided into equal parts and receive from each worker accordingly.

◆ GetPreferredHostDevice()

Device tvm::runtime::GetPreferredHostDevice ( Device  device)
inline

Get the preferred host device from the input device.

  • For CUDA and ROCm, CUDAHost and ROCMHost will be returned for pinned memory, since pinned memory reduces copy overhead.
  • For other devices, CPU is returned as a fallback.

◆ GetRPCSessionIndex()

int tvm::runtime::GetRPCSessionIndex ( Device  dev)
inline

Return the RPCSessTable index of the RPC Session that owns this device.

Returns
the table index.

◆ GetVectorBytes()

int tvm::runtime::GetVectorBytes ( DataType  dtype)
inline

Get the number of bytes needed in a vector.

Parameters
dtypeThe data type.
Returns
Number of bytes needed.

◆ IsRPCSessionDevice()

bool tvm::runtime::IsRPCSessionDevice ( Device  dev)
inline

Return true if a Device is owned by an RPC session.

◆ LoadVMModule()

TVM_RUNTIME_DLL ffi::Module tvm::runtime::LoadVMModule ( std::string  path,
ffi::Optional< Device device 
)

Load a runtime Module, then create and initialize a RelaxVM.

Parameters
pathThe path to the runtime Module (a DSO file) to be loaded
deviceThe default device used to initialize the RelaxVM
Returns
The RelaxVM as a runtime Module

◆ operator<<() [1/2]

std::ostream& tvm::runtime::operator<< ( std::ostream &  os,
const DataType dtype 
)
inline

◆ operator<<() [2/2]

std::ostream& tvm::runtime::operator<< ( std::ostream &  os,
DLDevice  dev 
)
inline

◆ RecvFromPrevGroup()

TVM_RUNTIME_DLL void tvm::runtime::RecvFromPrevGroup ( Tensor  buffer)

Receive a buffer from the corresponding worker in the previous group. An error is thrown if the worker is already in the first group.

Parameters
bufferThe receiving buffer.

◆ RecvFromWorker()

TVM_RUNTIME_DLL void tvm::runtime::RecvFromWorker ( Tensor  buffer,
int  sender_id 
)

Receive a buffer from the target sender worker (globally across all groups).

Parameters
bufferThe receiving buffer.
sender_idThe global sender worker id.

◆ RecvFromWorker0()

TVM_RUNTIME_DLL void tvm::runtime::RecvFromWorker0 ( Tensor  buffer)

Receive a buffer from worker-0. No-op if the current worker is worker-0.

Parameters
bufferThe buffer to be received

◆ ReduceKind2String()

std::string tvm::runtime::ReduceKind2String ( ReduceKind  kind)
inline

Converts ReduceKind to string.

◆ RemoveRPCSessionMask()

Device tvm::runtime::RemoveRPCSessionMask ( Device  dev)
inline

Remove the RPC session mask from a Device. RPC clients typically do this when encoding a Device for transmission to an RPC remote. On the wire, RPCdevice are expected to be valid on the server without interpretation.

Parameters
devA Device with non-zero RPC Session mask, valid on the RPC client.
Returns
A Device without any RPC Session mask, valid on the RPC server.

◆ RuntimeEnabled()

TVM_RUNTIME_DLL bool tvm::runtime::RuntimeEnabled ( const ffi::String &  target)

Check if runtime module is enabled for target.

Parameters
targetThe target module name.
Returns
Whether runtime is enabled.

◆ SaveDLTensor()

bool tvm::runtime::SaveDLTensor ( support::Stream strm,
const DLTensor *  tensor 
)
inline

Save a DLTensor to stream.

Parameters
strmThe output stream
tensorThe tensor to be saved.

◆ ScatterFromWorker0()

TVM_RUNTIME_DLL void tvm::runtime::ScatterFromWorker0 ( ffi::Optional< Tensor send,
bool  in_group,
Tensor  recv 
)

Perform a scatter operation from worker-0, chunking the given buffer into equal parts.

Parameters
sendFor worker-0, it must be provided, and otherwise, the buffer must be None. The buffer will be divided into equal parts and sent to each worker accordingly.
in_groupWhether the scatter operation performs globally or in group as default.
recvThe receiving buffer, which must not be None.

◆ SendToNextGroup()

TVM_RUNTIME_DLL void tvm::runtime::SendToNextGroup ( Tensor  buffer)

Send a buffer to the corresponding worker in the next group. An error is thrown if the worker is already in the last group.

Parameters
bufferThe sending buffer.

◆ SendToWorker()

TVM_RUNTIME_DLL void tvm::runtime::SendToWorker ( Tensor  buffer,
int  receiver_id 
)

Send a buffer to the target receiver worker (globally across all groups).

Parameters
bufferThe sending buffer.
receiver_idThe global receiver worker id.

◆ SyncWorker()

TVM_RUNTIME_DLL void tvm::runtime::SyncWorker ( )

Called by the worker thread. Waiting until the worker completes all its tasks. As a specific example, on a CUDA worker, it blocks until all kernels are launched and cudaStreamSynchronize is complete.

◆ TypeEqual()

bool tvm::runtime::TypeEqual ( DLDataType  lhs,
DLDataType  rhs 
)
inline

Check whether two types are equal .

Parameters
lhsThe left operand.
rhsThe right operand.

◆ TypeMatch()

bool tvm::runtime::TypeMatch ( DLDataType  t,
int  code,
int  bits,
int  lanes = 1 
)
inline

Check whether type matches the given spec.

Parameters
tThe type
codeThe type code.
bitsThe number of bits to be matched.
lanesThe number of lanes in the type.

◆ WorkerId()

TVM_RUNTIME_DLL int tvm::runtime::WorkerId ( )

Get the local worker id.

◆ WrapTimeEvaluator()

ffi::Function tvm::runtime::WrapTimeEvaluator ( ffi::Function  f,
Device  dev,
int  number,
int  repeat,
int  min_repeat_ms,
int  limit_zero_time_iterations,
int  cooldown_interval_ms,
int  repeats_to_cooldown,
int  cache_flush_bytes = 0,
ffi::Function  f_preproc = nullptr 
)

Wrap a timer function to measure the time cost of a given packed function.

Approximate implementation:

f() // warmup
for i in range(repeat)
f_preproc()
while True:
start = time()
for j in range(number):
f()
duration_ms = time() - start
if duration_ms >= min_repeat_ms:
break
else:
number = (min_repeat_ms / (duration_ms / number) + 1
if cooldown_interval_ms and i % repeats_to_cooldown == 0:
sleep(cooldown_interval_ms)
Parameters
fThe function argument.
devThe device.
numberThe number of times to run this function for taking average. We call these runs as one repeat of measurement.
repeatThe number of times to repeat the measurement. In total, the function will be invoked (1 + number x repeat) times, where the first one is warm up and will be discarded. The returned result contains repeat costs, each of which is an average of number costs.
min_repeat_msThe minimum duration of one repeat in milliseconds. By default, one repeat contains number runs. If this parameter is set, the parameters number will be dynamically adjusted to meet the minimum duration requirement of one repeat. i.e., When the run time of one repeat falls below this time, the number parameter will be automatically increased.
limit_zero_time_iterationsThe maximum number of repeats when measured time is equal to 0. It helps to avoid hanging during measurements.
cooldown_interval_msThe cooldown interval in milliseconds between the number of repeats defined by repeats_to_cooldown.
repeats_to_cooldownThe number of repeats before the cooldown is activated.
cache_flush_bytesThe number of bytes to flush from cache before
f_preprocThe function to be executed before we execute time evaluator.
Returns
f_timer A timer function.

Variable Documentation

◆ kAllocAlignment

constexpr int tvm::runtime::kAllocAlignment = 64
constexpr

Number of bytes each allocation must align to.

◆ kDefaultWorkspaceAlignment

constexpr int tvm::runtime::kDefaultWorkspaceAlignment = 1
constexpr

Number of bytes each allocation must align to by default in the workspace buffer to service intermediate tensors.

◆ kMaxStackAlloca

constexpr int tvm::runtime::kMaxStackAlloca = 1024
constexpr

Maximum size that can be allocated on stack.

◆ kRPCSessMask

constexpr int tvm::runtime::kRPCSessMask = 128
constexpr

The device type bigger than this is RPC device.

◆ kRuntimeDiscoDRef

constexpr int32_t tvm::runtime::kRuntimeDiscoDRef = TVMFFITypeIndex::kTVMFFIDynObjectBegin - 14
constexpr

Static FFI type index for runtime::disco::DRef.

Allocated within the [kTVMFFIDynObjectBegin - 16, kTVMFFIDynObjectBegin) custom-static slot range. The sibling constant kRuntimeRPCObjectRef lives in src/runtime/rpc/rpc_session.h and uses ... - 13; values must remain disjoint across this small reserved block.

◆ kTempAllocaAlignment

constexpr int tvm::runtime::kTempAllocaAlignment = 64
constexpr

Number of bytes each allocation must align to in temporary allocation.

◆ kTVMTensorMagic

constexpr uint64_t tvm::runtime::kTVMTensorMagic = 0xDD5E40F096B4A13F
constexpr

Magic number for Tensor file.