tvm.runtime.disco

TVM distributed runtime API.

class tvm.runtime.disco.DModule(dref: tvm.runtime.disco.session.DRef, session: tvm.runtime.disco.session.Session)

A Module in a Disco session.

class tvm.runtime.disco.DPackedFunc(dref: tvm.runtime.disco.session.DRef, session: tvm.runtime.disco.session.Session)

A PackedFunc in a Disco session.

class tvm.runtime.disco.DRef

An object that exists on all workers. The controller process assigns a unique “register id” to each object, and the worker process uses this id to refer to the object residing on itself.

debug_get_from_remote(worker_id: int) Any

Get the value of a DRef from a remote worker. It is only used for debugging purposes.

Parameters

worker_id (int) – The id of the worker to be fetched from.

Returns

value – The value of the register.

Return type

object

debug_copy_from(worker_id: int, value: Union[numpy.ndarray, tvm.runtime.ndarray.NDArray]) None

Copy an NDArray value to remote for debugging purposes.

Parameters
  • worker_id (int) – The id of the worker to be copied to.

  • value (Union[numpy.ndarray, NDArray]) – The value to be copied.

class tvm.runtime.disco.ProcessSession(num_workers: int, num_groups: int = 1, entrypoint: str = 'tvm.exec.disco_worker')

A Disco session backed by pipe-based multi-processing.

class tvm.runtime.disco.Session

A Disco interactive session. It allows users to interact with the Disco command queue with various PackedFunc calling convention.

empty(shape: Sequence[int], dtype: str, device: Optional[tvm._ffi.runtime_ctypes.Device] = None, worker0_only: bool = False, in_group: bool = True) tvm.runtime.disco.session.DRef

Create an empty NDArray on all workers and attach them to a DRef.

Parameters
  • shape (tuple of int) – The shape of the NDArray.

  • dtype (str) – The data type of the NDArray.

  • device (Optional[Device] = None) – The device of the NDArray.

  • worker0_only (bool) – If False (default), allocate an array on each worker. If True, only allocate an array on worker0.

  • in_group (bool) – Take effective when worker0_only is True. If True (default), allocate an array on each first worker in each group. If False, only allocate an array on worker0 globally.

Returns

array – The created NDArray.

Return type

DRef

shutdown()

Shut down the Disco session

property num_workers: int

Return the number of workers in the session

get_global_func(name: str) tvm.runtime.disco.session.DRef

Get a global function on workers.

Parameters

name (str) – The name of the global function.

Returns

func – The global packed function

Return type

DRef

import_python_module(module_name: str) None

Import a python module in each worker

This may be required before call

Parameters

module_name (str) – The python module name, as it would be used in a python import statement.

call_packed(func: tvm.runtime.disco.session.DRef, *args) tvm.runtime.disco.session.DRef

Call a PackedFunc on workers providing variadic arguments.

Parameters
  • func (PackedFunc) – The function to be called.

  • *args (various types) – In the variadic arguments, the supported types include: - integers and floating point numbers; - DLDataType; - DLDevice; - str (std::string in C++); - DRef.

Returns

return_value – The return value of the function call.

Return type

various types

Notes

Examples of unsupported types: - NDArray, DLTensor,; - TVM Objects, including PackedFunc, Module and String.

sync_worker_0() None

Synchronize the controller with worker-0, and it will wait until the worker-0 finishes executing all the existing instructions.

copy_from_worker_0(host_array: tvm.runtime.ndarray.NDArray, remote_array: tvm.runtime.disco.session.DRef) None

Copy an NDArray from worker-0 to the controller-side NDArray.

Parameters
  • host_array (numpy.ndarray) – The array to be copied to worker-0.

  • remote_array (NDArray) – The NDArray on worker-0.

copy_to_worker_0(host_array: tvm.runtime.ndarray.NDArray, remote_array: Optional[tvm.runtime.disco.session.DRef] = None) tvm.runtime.disco.session.DRef

Copy the controller-side NDArray to worker-0.

Parameters
  • host_array (NDArray) – The array to be copied to worker-0.

  • remote_array (Optiona[DRef]) – The destination NDArray on worker-0.

Returns

output_array – The DRef containing the copied data on worker0, and NullOpt on all other workers. If remote_array was provided, this return value is the same as remote_array. Otherwise, it is the newly allocated space.

Return type

DRef

load_vm_module(path: str, device: Optional[tvm._ffi.runtime_ctypes.Device] = None) tvm.runtime.disco.session.DModule

Load a VM module from a file.

Parameters
  • path (str) – The path to the VM module file.

  • device (Optional[Device] = None) – The device to load the VM module to. Default to the default device of each worker.

Returns

module – The loaded VM module.

Return type

DModule

init_ccl(ccl: str, *device_ids)

Initialize the underlying communication collective library.

Parameters
  • ccl (str) – The name of the communication collective library. Currently supported libraries are: - nccl - rccl - mpi

  • *device_ids (int) – The device IDs to be used by the underlying communication library.

broadcast(src: Union[numpy.ndarray, tvm.runtime.ndarray.NDArray], dst: Optional[tvm.runtime.disco.session.DRef] = None, in_group: bool = True) tvm.runtime.disco.session.DRef

Broadcast an array to all workers

Parameters
  • src (Union[np.ndarray, NDArray]) – The array to be broadcasted.

  • dst (Optional[DRef]) – The output array. If None, an array matching the shape and dtype of src will be allocated on each worker.

  • in_group (bool) – Whether the broadcast operation performs globally or in group as default.

Returns

output_array – The DRef containing the broadcasted data on all workers. If dst was provided, this return value is the same as dst. Otherwise, it is the newly allocated space.

Return type

DRef

broadcast_from_worker0(src: tvm.runtime.disco.session.DRef, dst: tvm.runtime.disco.session.DRef, in_group: bool = True) tvm.runtime.disco.session.DRef

Broadcast an array from worker-0 to all other workers.

Parameters
  • src (Union[np.ndarray, NDArray]) – The array to be broadcasted.

  • dst (Optional[DRef]) – The output array. If None, an array matching the shape and dtype of src will be allocated on each worker.

  • in_group (bool) – Whether the broadcast operation performs globally or in group as default.

scatter(src: Union[numpy.ndarray, tvm.runtime.ndarray.NDArray], dst: Optional[tvm.runtime.disco.session.DRef] = None, in_group: bool = True) tvm.runtime.disco.session.DRef

Scatter an array across all workers

Parameters
  • src (Union[np.ndarray, NDArray]) – The array to be scattered. The first dimension of this array, src.shape[0], must be equal to the number of workers.

  • dst (Optional[DRef]) – The output array. If None, an array with compatible shape and the same dtype as src will be allocated on each worker.

  • in_group (bool) – Whether the scatter operation performs globally or in group as default.

Returns

output_array – The DRef containing the scattered data on all workers. If dst was provided, this return value is the same as dst. Otherwise, it is the newly allocated space.

Return type

DRef

scatter_from_worker0(from_array: tvm.runtime.disco.session.DRef, to_array: tvm.runtime.disco.session.DRef, in_group: bool = True) None

Scatter an array from worker-0 to all other workers.

Parameters
  • src (Union[np.ndarray, NDArray]) – The array to be scattered. The first dimension of this array, src.shape[0], must be equal to the number of workers.

  • dst (Optional[DRef]) – The output array. If None, an array with compatible shape and the same dtype as src will be allocated on each worker.

  • in_group (bool) – Whether the scatter operation performs globally or in group as default.

gather_to_worker0(from_array: tvm.runtime.disco.session.DRef, to_array: tvm.runtime.disco.session.DRef, in_group: bool = True) None

Gather an array from all other workers to worker-0.

Parameters
  • from_array (DRef) – The array to be gathered from.

  • to_array (DRef) – The array to be gathered to.

  • in_group (bool) – Whether the gather operation performs globally or in group as default.

allreduce(src: tvm.runtime.disco.session.DRef, dst: tvm.runtime.disco.session.DRef, op: str = 'sum', in_group: bool = True) tvm.runtime.disco.session.DRef

Perform an allreduce operation on an array.

Parameters
  • array (DRef) – The array to be reduced.

  • op (str = "sum") – The reduce operation to be performed. Available options are: - “sum” - “prod” - “min” - “max” - “avg”

  • in_group (bool) – Whether the reduce operation performs globally or in group as default.

allgather(src: tvm.runtime.disco.session.DRef, dst: tvm.runtime.disco.session.DRef, in_group: bool = True) tvm.runtime.disco.session.DRef

Perform an allgather operation on an array.

Parameters
  • src (DRef) – The array to be gathered from.

  • dst (DRef) – The array to be gathered to.

  • in_group (bool) – Whether the reduce operation performs globally or in group as default.

class tvm.runtime.disco.ThreadedSession(num_workers: int, num_groups: int = 1)

A Disco session backed by multi-threading.

class tvm.runtime.disco.SocketSession(num_nodes: int, num_workers_per_node: int, num_groups: int, host: str, port: int)

A Disco session backed by socket-based multi-node communication.