|
tvm
|
Namespaces | |
| cuda_ipc | |
| memory | |
| symbol | |
| namespace for constant symbols | |
| vm | |
Classes | |
| class | DataType |
| Runtime primitive data type. More... | |
| class | DeviceAPI |
| TVM Runtime Device API, abstracts the device specific interface for memory management. More... | |
| class | DiscoWorker |
| A worker in Disco. It takes a channel to communication with the controler. The worker can be run in a separate thread or process as long as the channel supports bi-directional communication in-between. More... | |
| struct | ThreadLocalDiscoWorker |
| A threadlocal wrapper of DiscoWorker. More... | |
| class | DRefObj |
| An object that exists on all workers. More... | |
| class | DRef |
| Managed reference to DRefObj. More... | |
| class | SessionObj |
| A Disco interactive session. It allows users to interact with the Disco command queue with various ffi::Function calling convention. More... | |
| class | Session |
| Managed reference to SessionObj. More... | |
| class | DiscoChannel |
| A bi-directional channel for controler-worker communication. This channel is primarily used to transfer control messages but not data. More... | |
| class | WorkerZeroData |
| A special communication channel between controler and worker-0, assuming they are always collocated in the same process. More... | |
| class | NVTXScopedRange |
| A class to create a NVTX range. No-op if TVM is not built against NVTX. More... | |
| class | Tensor |
| Managed Tensor. The array is backed by reference counted blocks. More... | |
| class | TimerNode |
| Base class for all timer implementations. More... | |
| class | Timer |
| Timer for a specific device. More... | |
Enumerations | |
| enum | TVMDeviceExtType { TVMDeviceExtType_End = 36 } |
| Extension device types in TVM. More... | |
| enum | DeviceAttrKind : int { kExist = 0 , kMaxThreadsPerBlock = 1 , kWarpSize = 2 , kMaxSharedMemoryPerBlock = 3 , kComputeVersion = 4 , kDeviceName = 5 , kMaxClockRate = 6 , kMultiProcessorCount = 7 , kMaxThreadDimensions = 8 , kMaxRegistersPerBlock = 9 , kGcnArch = 10 , kApiVersion = 11 , kDriverVersion = 12 , kL2CacheSizeBytes = 13 , kTotalGlobalMemory = 14 , kAvailableGlobalMemory = 15 , kImagePitchAlignment = 16 } |
| the query type into GetAttr More... | |
| enum class | ReduceKind : int32_t { kSum = 0 , kProd = 1 , kMin = 2 , kMax = 3 , kAvg = 4 } |
| Possible kinds of reduction operations. More... | |
| enum class | DiscoAction : int32_t { kShutDown = 0 , kKillReg = 1 , kGetGlobalFunc = 2 , kCallPacked = 3 , kSyncWorker = 4 , kCopyFromWorker0 = 5 , kCopyToWorker0 = 6 , kDebugGetFromRemote = 7 , kDebugSetRegister = 8 } |
| All possible kinds of Disco commands. More... | |
Functions | |
| int | GetVectorBytes (DataType dtype) |
| Get the number of bytes needed in a vector. More... | |
| bool | TypeMatch (DLDataType t, int code, int bits, int lanes=1) |
| Check whether type matches the given spec. More... | |
| bool | TypeEqual (DLDataType lhs, DLDataType rhs) |
| Check whether two types are equal . More... | |
| std::ostream & | operator<< (std::ostream &os, const DataType &dtype) |
| const char * | DLDeviceType2Str (int type) |
| The name of DLDeviceType. More... | |
| bool | IsRPCSessionDevice (Device dev) |
| Return true if a Device is owned by an RPC session. More... | |
| int | GetRPCSessionIndex (Device dev) |
| Return the RPCSessTable index of the RPC Session that owns this device. More... | |
| Device | RemoveRPCSessionMask (Device dev) |
| Remove the RPC session mask from a Device. RPC clients typically do this when encoding a Device for transmission to an RPC remote. On the wire, RPCdevice are expected to be valid on the server without interpretation. More... | |
| std::ostream & | operator<< (std::ostream &os, DLDevice dev) |
| Device | AddRPCSessionMask (Device dev, int session_table_index) |
| Add a RPC session mask to a Device. RPC clients typically do this when decoding a Device received from a RPC remote. More... | |
| TVM_RUNTIME_DLL bool | RuntimeEnabled (const ffi::String &target) |
| Check if runtime module is enabled for target. More... | |
| std::string | ReduceKind2String (ReduceKind kind) |
Converts ReduceKind to string. More... | |
| TVM_RUNTIME_DLL ffi::Module | LoadVMModule (std::string path, ffi::Optional< Device > device) |
| Load a runtime Module, then create and initialize a RelaxVM. More... | |
| TVM_RUNTIME_DLL Tensor | DiscoEmptyTensor (ffi::Shape shape, DataType dtype, ffi::Optional< Device > device) |
| Create an uninitialized empty Tensor. More... | |
| TVM_RUNTIME_DLL void | AllReduce (Tensor send, ReduceKind reduce_kind, bool in_group, Tensor recv) |
| Perform an allreduce operation using the underlying communication library. More... | |
| TVM_RUNTIME_DLL void | AllGather (Tensor send, bool in_group, Tensor recv) |
| Perform an allgather operation using the underlying communication library. More... | |
| TVM_RUNTIME_DLL void | BroadcastFromWorker0 (Tensor send, bool in_group, Tensor recv) |
| Perform a broadcast operation from worker-0. More... | |
| TVM_RUNTIME_DLL void | ScatterFromWorker0 (ffi::Optional< Tensor > send, bool in_group, Tensor recv) |
| Perform a scatter operation from worker-0, chunking the given buffer into equal parts. More... | |
| TVM_RUNTIME_DLL void | GatherToWorker0 (Tensor send, bool in_group, ffi::Optional< Tensor > recv) |
| Perform a gather operation to worker-0. More... | |
| TVM_RUNTIME_DLL void | RecvFromWorker0 (Tensor buffer) |
| Receive a buffer from worker-0. No-op if the current worker is worker-0. More... | |
| TVM_RUNTIME_DLL void | SendToNextGroup (Tensor buffer) |
| Send a buffer to the corresponding worker in the next group. An error is thrown if the worker is already in the last group. More... | |
| TVM_RUNTIME_DLL void | RecvFromPrevGroup (Tensor buffer) |
| Receive a buffer from the corresponding worker in the previous group. An error is thrown if the worker is already in the first group. More... | |
| TVM_RUNTIME_DLL void | SendToWorker (Tensor buffer, int receiver_id) |
| Send a buffer to the target receiver worker (globally across all groups). More... | |
| TVM_RUNTIME_DLL void | RecvFromWorker (Tensor buffer, int sender_id) |
| Receive a buffer from the target sender worker (globally across all groups). More... | |
| TVM_RUNTIME_DLL int | WorkerId () |
| Get the local worker id. More... | |
| TVM_RUNTIME_DLL void | SyncWorker () |
| Called by the worker thread. Waiting until the worker completes all its tasks. As a specific example, on a CUDA worker, it blocks until all kernels are launched and cudaStreamSynchronize is complete. More... | |
| std::string | DiscoAction2String (DiscoAction action) |
Converts the enum class DiscoAction to string. More... | |
| bool | SaveDLTensor (support::Stream *strm, const DLTensor *tensor) |
| Save a DLTensor to stream. More... | |
| Device | GetPreferredHostDevice (Device device) |
| Get the preferred host device from the input device. More... | |
| ffi::Function | WrapTimeEvaluator (ffi::Function f, Device dev, int number, int repeat, int min_repeat_ms, int limit_zero_time_iterations, int cooldown_interval_ms, int repeats_to_cooldown, int cache_flush_bytes=0, ffi::Function f_preproc=nullptr) |
| Wrap a timer function to measure the time cost of a given packed function. More... | |
Variables | |
| constexpr int | kAllocAlignment = 64 |
| Number of bytes each allocation must align to. More... | |
| constexpr int | kTempAllocaAlignment = 64 |
| Number of bytes each allocation must align to in temporary allocation. More... | |
| constexpr int | kMaxStackAlloca = 1024 |
| Maximum size that can be allocated on stack. More... | |
| constexpr int | kDefaultWorkspaceAlignment = 1 |
| Number of bytes each allocation must align to by default in the workspace buffer to service intermediate tensors. More... | |
| constexpr int | kRPCSessMask = 128 |
| The device type bigger than this is RPC device. More... | |
| constexpr int32_t | kRuntimeDiscoDRef = TVMFFITypeIndex::kTVMFFIDynObjectBegin - 14 |
Static FFI type index for runtime::disco::DRef. More... | |
| constexpr uint64_t | kTVMTensorMagic = 0xDD5E40F096B4A13F |
| Magic number for Tensor file. More... | |
| enum tvm::runtime::DeviceAttrKind : int |
the query type into GetAttr
|
strong |
|
strong |
Extension device types in TVM.
Additional enumerators to supplement those provided by DLPack's DLDeviceType enumeration.
MAINTAINERS NOTE #1: We need to ensure that the two devices are identified by the same integer. Currently this requires manual verification. Discussed here: https://github.com/dmlc/dlpack/issues/111 As of DLPack v0.7, the highest-valued enumerator in DLDeviceType is kDLHexagon = 16.
MAINTAINERS NOTE #2: As of DLPack v0.7, the definition for DLDeviceType specifies an underlying storage type of int32_t. That guarantees a variable of type DLDeviceType is capable of holding any integers provided by either of these enumerations.
However, the int32_t specification only applies when the header file is compiled as C++, and this header file is also meant to work as C code. So the unspecified storage type could be a latent bug when compiled as C.
| Enumerator | |
|---|---|
| TVMDeviceExtType_End | |
Add a RPC session mask to a Device. RPC clients typically do this when decoding a Device received from a RPC remote.
| dev | A Device without any RPC Session mask, valid on the RPC server. |
| session_table_index | Numeric index of the RPC session in the session table. |
| TVM_RUNTIME_DLL void tvm::runtime::AllGather | ( | Tensor | send, |
| bool | in_group, | ||
| Tensor | recv | ||
| ) |
Perform an allgather operation using the underlying communication library.
| send | The array send to perform allgather on |
| in_group | Whether the allgather operation performs globally or in group as default. |
| recv | The array receives the outcome of allgather |
| TVM_RUNTIME_DLL void tvm::runtime::AllReduce | ( | Tensor | send, |
| ReduceKind | reduce_kind, | ||
| bool | in_group, | ||
| Tensor | recv | ||
| ) |
Perform an allreduce operation using the underlying communication library.
| send | The array send to perform allreduce on |
| reduce_kind | The kind of reduction operation (e.g. sum, avg, min, max) |
| in_group | Whether the allreduce operation performs globally or in group as default. |
| recv | The array receives the outcome of allreduce |
| TVM_RUNTIME_DLL void tvm::runtime::BroadcastFromWorker0 | ( | Tensor | send, |
| bool | in_group, | ||
| Tensor | recv | ||
| ) |
Perform a broadcast operation from worker-0.
| send | The buffer to be broadcasted |
| in_group | Whether the broadcast operation performs globally or in group as default. |
| recv | The buffer receives the broadcasted array |
|
inline |
Converts the enum class DiscoAction to string.
| TVM_RUNTIME_DLL Tensor tvm::runtime::DiscoEmptyTensor | ( | ffi::Shape | shape, |
| DataType | dtype, | ||
| ffi::Optional< Device > | device | ||
| ) |
|
inline |
The name of DLDeviceType.
| type | The device type. |
| TVM_RUNTIME_DLL void tvm::runtime::GatherToWorker0 | ( | Tensor | send, |
| bool | in_group, | ||
| ffi::Optional< Tensor > | recv | ||
| ) |
Perform a gather operation to worker-0.
| send | The sending buffer, which must not be None. |
| in_group | Whether the gather operation performs globally or in group as default. |
| recv | For worker-0, it must be provided, and otherwise, the buffer must be None. The receiving buffer will be divided into equal parts and receive from each worker accordingly. |
Get the preferred host device from the input device.
|
inline |
Return the RPCSessTable index of the RPC Session that owns this device.
|
inline |
Get the number of bytes needed in a vector.
| dtype | The data type. |
|
inline |
Return true if a Device is owned by an RPC session.
| TVM_RUNTIME_DLL ffi::Module tvm::runtime::LoadVMModule | ( | std::string | path, |
| ffi::Optional< Device > | device | ||
| ) |
Load a runtime Module, then create and initialize a RelaxVM.
| path | The path to the runtime Module (a DSO file) to be loaded |
| device | The default device used to initialize the RelaxVM |
|
inline |
|
inline |
| TVM_RUNTIME_DLL void tvm::runtime::RecvFromPrevGroup | ( | Tensor | buffer | ) |
Receive a buffer from the corresponding worker in the previous group. An error is thrown if the worker is already in the first group.
| buffer | The receiving buffer. |
| TVM_RUNTIME_DLL void tvm::runtime::RecvFromWorker | ( | Tensor | buffer, |
| int | sender_id | ||
| ) |
Receive a buffer from the target sender worker (globally across all groups).
| buffer | The receiving buffer. |
| sender_id | The global sender worker id. |
| TVM_RUNTIME_DLL void tvm::runtime::RecvFromWorker0 | ( | Tensor | buffer | ) |
Receive a buffer from worker-0. No-op if the current worker is worker-0.
| buffer | The buffer to be received |
|
inline |
Converts ReduceKind to string.
Remove the RPC session mask from a Device. RPC clients typically do this when encoding a Device for transmission to an RPC remote. On the wire, RPCdevice are expected to be valid on the server without interpretation.
| dev | A Device with non-zero RPC Session mask, valid on the RPC client. |
| TVM_RUNTIME_DLL bool tvm::runtime::RuntimeEnabled | ( | const ffi::String & | target | ) |
Check if runtime module is enabled for target.
| target | The target module name. |
|
inline |
Save a DLTensor to stream.
| strm | The output stream |
| tensor | The tensor to be saved. |
| TVM_RUNTIME_DLL void tvm::runtime::ScatterFromWorker0 | ( | ffi::Optional< Tensor > | send, |
| bool | in_group, | ||
| Tensor | recv | ||
| ) |
Perform a scatter operation from worker-0, chunking the given buffer into equal parts.
| send | For worker-0, it must be provided, and otherwise, the buffer must be None. The buffer will be divided into equal parts and sent to each worker accordingly. |
| in_group | Whether the scatter operation performs globally or in group as default. |
| recv | The receiving buffer, which must not be None. |
| TVM_RUNTIME_DLL void tvm::runtime::SendToNextGroup | ( | Tensor | buffer | ) |
Send a buffer to the corresponding worker in the next group. An error is thrown if the worker is already in the last group.
| buffer | The sending buffer. |
| TVM_RUNTIME_DLL void tvm::runtime::SendToWorker | ( | Tensor | buffer, |
| int | receiver_id | ||
| ) |
Send a buffer to the target receiver worker (globally across all groups).
| buffer | The sending buffer. |
| receiver_id | The global receiver worker id. |
| TVM_RUNTIME_DLL void tvm::runtime::SyncWorker | ( | ) |
Called by the worker thread. Waiting until the worker completes all its tasks. As a specific example, on a CUDA worker, it blocks until all kernels are launched and cudaStreamSynchronize is complete.
|
inline |
Check whether two types are equal .
| lhs | The left operand. |
| rhs | The right operand. |
|
inline |
Check whether type matches the given spec.
| t | The type |
| code | The type code. |
| bits | The number of bits to be matched. |
| lanes | The number of lanes in the type. |
| TVM_RUNTIME_DLL int tvm::runtime::WorkerId | ( | ) |
Get the local worker id.
| ffi::Function tvm::runtime::WrapTimeEvaluator | ( | ffi::Function | f, |
| Device | dev, | ||
| int | number, | ||
| int | repeat, | ||
| int | min_repeat_ms, | ||
| int | limit_zero_time_iterations, | ||
| int | cooldown_interval_ms, | ||
| int | repeats_to_cooldown, | ||
| int | cache_flush_bytes = 0, |
||
| ffi::Function | f_preproc = nullptr |
||
| ) |
Wrap a timer function to measure the time cost of a given packed function.
Approximate implementation:
| f | The function argument. |
| dev | The device. |
| number | The number of times to run this function for taking average. We call these runs as one repeat of measurement. |
| repeat | The number of times to repeat the measurement. In total, the function will be invoked (1 + number x repeat) times, where the first one is warm up and will be discarded. The returned result contains repeat costs, each of which is an average of number costs. |
| min_repeat_ms | The minimum duration of one repeat in milliseconds. By default, one repeat contains number runs. If this parameter is set, the parameters number will be dynamically adjusted to meet the minimum duration requirement of one repeat. i.e., When the run time of one repeat falls below this time, the number parameter will be automatically increased. |
| limit_zero_time_iterations | The maximum number of repeats when measured time is equal to 0. It helps to avoid hanging during measurements. |
| cooldown_interval_ms | The cooldown interval in milliseconds between the number of repeats defined by repeats_to_cooldown. |
| repeats_to_cooldown | The number of repeats before the cooldown is activated. |
| cache_flush_bytes | The number of bytes to flush from cache before |
| f_preproc | The function to be executed before we execute time evaluator. |
|
constexpr |
Number of bytes each allocation must align to.
|
constexpr |
Number of bytes each allocation must align to by default in the workspace buffer to service intermediate tensors.
|
constexpr |
Maximum size that can be allocated on stack.
|
constexpr |
The device type bigger than this is RPC device.
|
constexpr |
Static FFI type index for runtime::disco::DRef.
Allocated within the [kTVMFFIDynObjectBegin - 16, kTVMFFIDynObjectBegin) custom-static slot range. The sibling constant kRuntimeRPCObjectRef lives in src/runtime/rpc/rpc_session.h and uses ... - 13; values must remain disjoint across this small reserved block.
|
constexpr |
Number of bytes each allocation must align to in temporary allocation.
|
constexpr |
Magic number for Tensor file.