tvm
Namespaces | Classes | Typedefs | Enumerations | Functions | Variables
tvm::runtime Namespace Reference

Namespaces

 cuda_ipc
 
 details
 
 memory
 
 profiling
 
 symbol
 namespace for constant symbols
 
 threading
 
 vm
 

Classes

class  DataType
 Runtime primitive data type. More...
 
class  DeviceAPI
 TVM Runtime Device API, abstracts the device specific interface for memory management. More...
 
class  DiscoWorker
 A worker in Disco. It takes a channel to communication with the controler. The worker can be run in a separate thread or process as long as the channel supports bi-directional communication in-between. More...
 
struct  ThreadLocalDiscoWorker
 A threadlocal wrapper of DiscoWorker. More...
 
class  DRefObj
 An object that exists on all workers. More...
 
class  DRef
 Managed reference to DRefObj. More...
 
class  SessionObj
 A Disco interactive session. It allows users to interact with the Disco command queue with various ffi::Function calling convention. More...
 
class  Session
 Managed reference to SessionObj. More...
 
class  DiscoChannel
 A bi-directional channel for controler-worker communication. This channel is primarily used to transfer control messages but not data. More...
 
class  WorkerZeroData
 A special communication channel between controler and worker-0, assuming they are always collocated in the same process. More...
 
class  NDArray
 Managed NDArray. The array is backed by reference counted blocks. More...
 
class  NVTXScopedRange
 A class to create a NVTX range. No-op if TVM is not built against NVTX. More...
 
class  TimerNode
 Base class for all implementations. More...
 
class  Timer
 Timer for a specific device. More...
 

Typedefs

using tvm_index_t = ffi::Shape::index_type
 
using IntTuple = ffi::Shape
 
using IntTupleObj = ffi::ShapeObj
 

Enumerations

enum  TVMDeviceExtType { TVMDeviceExtType_End = 36 }
 Extension device types in TVM. More...
 
enum  DeviceAttrKind : int {
  kExist = 0 , kMaxThreadsPerBlock = 1 , kWarpSize = 2 , kMaxSharedMemoryPerBlock = 3 ,
  kComputeVersion = 4 , kDeviceName = 5 , kMaxClockRate = 6 , kMultiProcessorCount = 7 ,
  kMaxThreadDimensions = 8 , kMaxRegistersPerBlock = 9 , kGcnArch = 10 , kApiVersion = 11 ,
  kDriverVersion = 12 , kL2CacheSizeBytes = 13 , kTotalGlobalMemory = 14 , kAvailableGlobalMemory = 15 ,
  kImagePitchAlignment = 16
}
 the query type into GetAttr More...
 
enum class  ReduceKind : int32_t {
  kSum = 0 , kProd = 1 , kMin = 2 , kMax = 3 ,
  kAvg = 4
}
 Possible kinds of reduction operations. More...
 
enum class  DiscoAction : int32_t {
  kShutDown = 0 , kKillReg = 1 , kGetGlobalFunc = 2 , kCallPacked = 3 ,
  kSyncWorker = 4 , kCopyFromWorker0 = 5 , kCopyToWorker0 = 6 , kDebugGetFromRemote = 7 ,
  kDebugSetRegister = 8
}
 All possible kinds of Disco commands. More...
 
enum  TypeIndex : int32_t {
  kRuntimeModule = TVMFFITypeIndex::kTVMFFIModule , kRuntimeNDArray = TVMFFITypeIndex::kTVMFFINDArray , kRuntimeShape = TVMFFITypeIndex::kTVMFFIShape , kCustomStaticIndex = TVMFFITypeIndex::kTVMFFIDynObjectBegin - 16 ,
  kRuntimePackedFunc = kCustomStaticIndex + 1 , kRuntimeDiscoDRef = kCustomStaticIndex + 2 , kRuntimeRPCObjectRef = kCustomStaticIndex + 3 , kRuntimeString ,
  kRuntimeMap , kRuntimeArray , kStaticIndexEnd
}
 Namespace for the list of type index. More...
 

Functions

int GetVectorBytes (DataType dtype)
 Get the number of bytes needed in a vector. More...
 
bool TypeMatch (DLDataType t, int code, int bits, int lanes=1)
 Check whether type matches the given spec. More...
 
bool TypeEqual (DLDataType lhs, DLDataType rhs)
 Check whether two types are equal . More...
 
std::ostream & operator<< (std::ostream &os, const DataType &dtype)
 
const char * DLDeviceType2Str (int type)
 The name of DLDeviceType. More...
 
bool IsRPCSessionDevice (Device dev)
 Return true if a Device is owned by an RPC session. More...
 
int GetRPCSessionIndex (Device dev)
 Return the RPCSessTable index of the RPC Session that owns this device. More...
 
Device RemoveRPCSessionMask (Device dev)
 Remove the RPC session mask from a Device. RPC clients typically do this when encoding a Device for transmission to an RPC remote. On the wire, RPCdevice are expected to be valid on the server without interpretation. More...
 
std::ostream & operator<< (std::ostream &os, DLDevice dev)
 
Device AddRPCSessionMask (Device dev, int session_table_index)
 Add a RPC session mask to a Device. RPC clients typically do this when decoding a Device received from a RPC remote. More...
 
std::string ReduceKind2String (ReduceKind kind)
 Converts ReduceKind to string. More...
 
ffi::Module LoadVMModule (std::string path, Optional< Device > device)
 Load a runtime Module, then create and initialize a RelaxVM. More...
 
NDArray DiscoEmptyNDArray (ffi::Shape shape, DataType dtype, Optional< Device > device)
 Create an uninitialized empty NDArray. More...
 
void AllReduce (NDArray send, ReduceKind reduce_kind, bool in_group, NDArray recv)
 Perform an allreduce operation using the underlying communication library. More...
 
void AllGather (NDArray send, bool in_group, NDArray recv)
 Perform an allgather operation using the underlying communication library. More...
 
void BroadcastFromWorker0 (NDArray send, bool in_group, NDArray recv)
 Perform a broadcast operation from worker-0. More...
 
void ScatterFromWorker0 (Optional< NDArray > send, bool in_group, NDArray recv)
 Perform a scatter operation from worker-0, chunking the given buffer into equal parts. More...
 
void GatherToWorker0 (NDArray send, bool in_group, Optional< NDArray > recv)
 Perform a gather operation to worker-0. More...
 
void RecvFromWorker0 (NDArray buffer)
 Receive a buffer from worker-0. No-op if the current worker is worker-0. More...
 
void SendToNextGroup (NDArray buffer)
 Send a buffer to the corresponding worker in the next group. An error is thrown if the worker is already in the last group. More...
 
void RecvFromPrevGroup (NDArray buffer)
 Receive a buffer from the corresponding worker in the previous group. An error is thrown if the worker is already in the first group. More...
 
void SendToWorker (NDArray buffer, int receiver_id)
 Send a buffer to the target receiver worker (globally across all groups). More...
 
void RecvFromWorker (NDArray buffer, int sender_id)
 Receive a buffer from the target sender worker (globally across all groups). More...
 
int WorkerId ()
 Get the local worker id. More...
 
void SyncWorker ()
 Called by the worker thread. Waiting until the worker completes all its tasks. As a specific example, on a CUDA worker, it blocks until all kernels are launched and cudaStreamSynchronize is complete. More...
 
std::string DiscoAction2String (DiscoAction action)
 Converts the enum class DiscoAction to string. More...
 
bool RuntimeEnabled (const String &target)
 Check if runtime module is enabled for target. More...
 
bool SaveDLTensor (dmlc::Stream *strm, const DLTensor *tensor)
 Save a DLTensor to stream. More...
 
Device GetPreferredHostDevice (Device device)
 Get the preferred host device from the input device. More...
 
Timer DefaultTimer (Device dev)
 Default timer if one does not exist for the device. More...
 
template<typename T >
void parallel_for_with_threading_backend (T flambda, int64_t begin, int64_t end)
 

Variables

constexpr int kAllocAlignment = 64
 Number of bytes each allocation must align to. More...
 
constexpr int kTempAllocaAlignment = 64
 Number of bytes each allocation must align to in temporary allocation. More...
 
constexpr int kMaxStackAlloca = 1024
 Maximum size that can be allocated on stack. More...
 
constexpr int kDefaultWorkspaceAlignment = 1
 Number of bytes each allocation must align to by default in the workspace buffer to service intermediate tensors. More...
 
constexpr int kRPCSessMask = 128
 The device type bigger than this is RPC device. More...
 
constexpr uint64_t kTVMNDArrayMagic = 0xDD5E40F096B4A13F
 Magic number for NDArray file. More...
 

Typedef Documentation

◆ IntTuple

using tvm::runtime::IntTuple = typedef ffi::Shape

◆ IntTupleObj

using tvm::runtime::IntTupleObj = typedef ffi::ShapeObj

◆ tvm_index_t

using tvm::runtime::tvm_index_t = typedef ffi::Shape::index_type

Enumeration Type Documentation

◆ DeviceAttrKind

the query type into GetAttr

Enumerator
kExist 
kMaxThreadsPerBlock 
kWarpSize 
kMaxSharedMemoryPerBlock 
kComputeVersion 
kDeviceName 
kMaxClockRate 
kMultiProcessorCount 
kMaxThreadDimensions 
kMaxRegistersPerBlock 
kGcnArch 
kApiVersion 
kDriverVersion 
kL2CacheSizeBytes 
kTotalGlobalMemory 
kAvailableGlobalMemory 
kImagePitchAlignment 

◆ DiscoAction

enum tvm::runtime::DiscoAction : int32_t
strong

All possible kinds of Disco commands.

Enumerator
kShutDown 
kKillReg 
kGetGlobalFunc 
kCallPacked 
kSyncWorker 
kCopyFromWorker0 
kCopyToWorker0 
kDebugGetFromRemote 
kDebugSetRegister 

◆ ReduceKind

enum tvm::runtime::ReduceKind : int32_t
strong

Possible kinds of reduction operations.

Enumerator
kSum 
kProd 
kMin 
kMax 
kAvg 

◆ TVMDeviceExtType

Extension device types in TVM.

Additional enumerators to supplement those provided by DLPack's DLDeviceType enumeration.

MAINTAINERS NOTE #1: We need to ensure that the two devices are identified by the same integer. Currently this requires manual verification. Discussed here: https://github.com/dmlc/dlpack/issues/111 As of DLPack v0.7, the highest-valued enumerator in DLDeviceType is kDLHexagon = 16.

MAINTAINERS NOTE #2: As of DLPack v0.7, the definition for DLDeviceType specifies an underlying storage type of int32_t. That guarantees a variable of type DLDeviceType is capable of holding any integers provided by either of these enumerations.

However, the int32_t specification only applies when the header file is compiled as C++, and this header file is also meant to work as C code. So the unspecified storage type could be a latent bug when compiled as C.

Enumerator
TVMDeviceExtType_End 

◆ TypeIndex

enum tvm::runtime::TypeIndex : int32_t

Namespace for the list of type index.

Note
Use struct so that we have to use TypeIndex::ENumName to refer to the constant, but still able to use enum.
Enumerator
kRuntimeModule 

runtime::Module.

kRuntimeNDArray 

runtime::NDArray.

kRuntimeShape 

runtime::Shape.

kCustomStaticIndex 
kRuntimePackedFunc 

ffi::Function.

kRuntimeDiscoDRef 

runtime::DRef for disco distributed runtime

kRuntimeRPCObjectRef 

runtime::RPCObjectRef

kRuntimeString 
kRuntimeMap 
kRuntimeArray 
kStaticIndexEnd 

Function Documentation

◆ AddRPCSessionMask()

Device tvm::runtime::AddRPCSessionMask ( Device  dev,
int  session_table_index 
)
inline

Add a RPC session mask to a Device. RPC clients typically do this when decoding a Device received from a RPC remote.

Parameters
devA Device without any RPC Session mask, valid on the RPC server.
session_table_indexNumeric index of the RPC session in the session table.
Returns
A Device with RPC session mask added, valid on the RPC client.

◆ AllGather()

void tvm::runtime::AllGather ( NDArray  send,
bool  in_group,
NDArray  recv 
)

Perform an allgather operation using the underlying communication library.

Parameters
sendThe array send to perform allgather on
in_groupWhether the allgather operation performs globally or in group as default.
recvThe array receives the outcome of allgather

◆ AllReduce()

void tvm::runtime::AllReduce ( NDArray  send,
ReduceKind  reduce_kind,
bool  in_group,
NDArray  recv 
)

Perform an allreduce operation using the underlying communication library.

Parameters
sendThe array send to perform allreduce on
reduce_kindThe kind of reduction operation (e.g. sum, avg, min, max)
in_groupWhether the allreduce operation performs globally or in group as default.
recvThe array receives the outcome of allreduce

◆ BroadcastFromWorker0()

void tvm::runtime::BroadcastFromWorker0 ( NDArray  send,
bool  in_group,
NDArray  recv 
)

Perform a broadcast operation from worker-0.

Parameters
sendThe buffer to be broadcasted
in_groupWhether the broadcast operation performs globally or in group as default.
recvThe buffer receives the broadcasted array

◆ DefaultTimer()

Timer tvm::runtime::DefaultTimer ( Device  dev)

Default timer if one does not exist for the device.

Parameters
devThe device to time on.

Note that this timer performs synchronization between the device and CPU, which can lead to overhead in the reported results.

◆ DiscoAction2String()

std::string tvm::runtime::DiscoAction2String ( DiscoAction  action)
inline

Converts the enum class DiscoAction to string.

◆ DiscoEmptyNDArray()

NDArray tvm::runtime::DiscoEmptyNDArray ( ffi::Shape  shape,
DataType  dtype,
Optional< Device device 
)

Create an uninitialized empty NDArray.

Parameters
shapeThe shape of the NDArray
dtypeThe dtype of the NDArray
deviceThe device the NDArray is created on. If None, use the thread local default device
Returns
The NDArray created

◆ DLDeviceType2Str()

const char* tvm::runtime::DLDeviceType2Str ( int  type)
inline

The name of DLDeviceType.

Parameters
typeThe device type.
Returns
the device name.

◆ GatherToWorker0()

void tvm::runtime::GatherToWorker0 ( NDArray  send,
bool  in_group,
Optional< NDArray recv 
)

Perform a gather operation to worker-0.

Parameters
sendThe sending buffer, which must not be None.
in_groupWhether the gather operation performs globally or in group as default.
recvFor worker-0, it must be provided, and otherwise, the buffer must be None. The receiving buffer will be divided into equal parts and receive from each worker accordingly.

◆ GetPreferredHostDevice()

Device tvm::runtime::GetPreferredHostDevice ( Device  device)
inline

Get the preferred host device from the input device.

  • For CUDA and ROCm, CUDAHost and ROCMHost will be returned for pinned memory, since pinned memory reduces copy overhead.
  • For other devices, CPU is returned as a fallback.

◆ GetRPCSessionIndex()

int tvm::runtime::GetRPCSessionIndex ( Device  dev)
inline

Return the RPCSessTable index of the RPC Session that owns this device.

Returns
the table index.

◆ GetVectorBytes()

int tvm::runtime::GetVectorBytes ( DataType  dtype)
inline

Get the number of bytes needed in a vector.

Parameters
dtypeThe data type.
Returns
Number of bytes needed.

◆ IsRPCSessionDevice()

bool tvm::runtime::IsRPCSessionDevice ( Device  dev)
inline

Return true if a Device is owned by an RPC session.

◆ LoadVMModule()

ffi::Module tvm::runtime::LoadVMModule ( std::string  path,
Optional< Device device 
)

Load a runtime Module, then create and initialize a RelaxVM.

Parameters
pathThe path to the runtime Module (a DSO file) to be loaded
deviceThe default device used to initialize the RelaxVM
Returns
The RelaxVM as a runtime Module

◆ operator<<() [1/2]

std::ostream& tvm::runtime::operator<< ( std::ostream &  os,
const DataType dtype 
)
inline

◆ operator<<() [2/2]

std::ostream& tvm::runtime::operator<< ( std::ostream &  os,
DLDevice  dev 
)
inline

◆ parallel_for_with_threading_backend()

template<typename T >
void tvm::runtime::parallel_for_with_threading_backend ( flambda,
int64_t  begin,
int64_t  end 
)
inline

◆ RecvFromPrevGroup()

void tvm::runtime::RecvFromPrevGroup ( NDArray  buffer)

Receive a buffer from the corresponding worker in the previous group. An error is thrown if the worker is already in the first group.

Parameters
bufferThe receiving buffer.

◆ RecvFromWorker()

void tvm::runtime::RecvFromWorker ( NDArray  buffer,
int  sender_id 
)

Receive a buffer from the target sender worker (globally across all groups).

Parameters
bufferThe receiving buffer.
sender_idThe global sender worker id.

◆ RecvFromWorker0()

void tvm::runtime::RecvFromWorker0 ( NDArray  buffer)

Receive a buffer from worker-0. No-op if the current worker is worker-0.

Parameters
bufferThe buffer to be received

◆ ReduceKind2String()

std::string tvm::runtime::ReduceKind2String ( ReduceKind  kind)
inline

Converts ReduceKind to string.

◆ RemoveRPCSessionMask()

Device tvm::runtime::RemoveRPCSessionMask ( Device  dev)
inline

Remove the RPC session mask from a Device. RPC clients typically do this when encoding a Device for transmission to an RPC remote. On the wire, RPCdevice are expected to be valid on the server without interpretation.

Parameters
devA Device with non-zero RPC Session mask, valid on the RPC client.
Returns
A Device without any RPC Session mask, valid on the RPC server.

◆ RuntimeEnabled()

bool tvm::runtime::RuntimeEnabled ( const String &  target)

Check if runtime module is enabled for target.

Parameters
targetThe target module name.
Returns
Whether runtime is enabled.

◆ SaveDLTensor()

bool tvm::runtime::SaveDLTensor ( dmlc::Stream *  strm,
const DLTensor *  tensor 
)
inline

Save a DLTensor to stream.

Parameters
strmThe output stream
tensorThe tensor to be saved.

◆ ScatterFromWorker0()

void tvm::runtime::ScatterFromWorker0 ( Optional< NDArray send,
bool  in_group,
NDArray  recv 
)

Perform a scatter operation from worker-0, chunking the given buffer into equal parts.

Parameters
sendFor worker-0, it must be provided, and otherwise, the buffer must be None. The buffer will be divided into equal parts and sent to each worker accordingly.
in_groupWhether the scatter operation performs globally or in group as default.
recvThe receiving buffer, which must not be None.

◆ SendToNextGroup()

void tvm::runtime::SendToNextGroup ( NDArray  buffer)

Send a buffer to the corresponding worker in the next group. An error is thrown if the worker is already in the last group.

Parameters
bufferThe sending buffer.

◆ SendToWorker()

void tvm::runtime::SendToWorker ( NDArray  buffer,
int  receiver_id 
)

Send a buffer to the target receiver worker (globally across all groups).

Parameters
bufferThe sending buffer.
receiver_idThe global receiver worker id.

◆ SyncWorker()

void tvm::runtime::SyncWorker ( )

Called by the worker thread. Waiting until the worker completes all its tasks. As a specific example, on a CUDA worker, it blocks until all kernels are launched and cudaStreamSynchronize is complete.

◆ TypeEqual()

bool tvm::runtime::TypeEqual ( DLDataType  lhs,
DLDataType  rhs 
)
inline

Check whether two types are equal .

Parameters
lhsThe left operand.
rhsThe right operand.

◆ TypeMatch()

bool tvm::runtime::TypeMatch ( DLDataType  t,
int  code,
int  bits,
int  lanes = 1 
)
inline

Check whether type matches the given spec.

Parameters
tThe type
codeThe type code.
bitsThe number of bits to be matched.
lanesThe number of lanes in the type.

◆ WorkerId()

int tvm::runtime::WorkerId ( )

Get the local worker id.

Variable Documentation

◆ kAllocAlignment

constexpr int tvm::runtime::kAllocAlignment = 64
constexpr

Number of bytes each allocation must align to.

◆ kDefaultWorkspaceAlignment

constexpr int tvm::runtime::kDefaultWorkspaceAlignment = 1
constexpr

Number of bytes each allocation must align to by default in the workspace buffer to service intermediate tensors.

◆ kMaxStackAlloca

constexpr int tvm::runtime::kMaxStackAlloca = 1024
constexpr

Maximum size that can be allocated on stack.

◆ kRPCSessMask

constexpr int tvm::runtime::kRPCSessMask = 128
constexpr

The device type bigger than this is RPC device.

◆ kTempAllocaAlignment

constexpr int tvm::runtime::kTempAllocaAlignment = 64
constexpr

Number of bytes each allocation must align to in temporary allocation.

◆ kTVMNDArrayMagic

constexpr uint64_t tvm::runtime::kTVMNDArrayMagic = 0xDD5E40F096B4A13F
constexpr

Magic number for NDArray file.