tvm.runtime.disco

TVM distributed runtime API.

class tvm.runtime.disco.DModule(dref: DRef, session: Session)

A Module in a Disco session.

class tvm.runtime.disco.DPackedFunc(dref: DRef, session: Session)

A PackedFunc in a Disco session.

class tvm.runtime.disco.DRef

An object that exists on all workers. The controller process assigns a unique “register id” to each object, and the worker process uses this id to refer to the object residing on itself.

debug_get_from_remote(worker_id: int) Any

Get the value of a DRef from a remote worker. It is only used for debugging purposes.

Parameters:

worker_id (int) – The id of the worker to be fetched from.

Returns:

value – The value of the register.

Return type:

object

debug_copy_from(worker_id: int, value: ndarray | NDArray) None

Copy an NDArray value to remote for debugging purposes.

Parameters:
  • worker_id (int) – The id of the worker to be copied to.

  • value (Union[numpy.ndarray, NDArray]) – The value to be copied.

class tvm.runtime.disco.ProcessSession(num_workers: int, num_groups: int = 1, entrypoint: str = 'tvm.exec.disco_worker')

A Disco session backed by pipe-based multi-processing.

class tvm.runtime.disco.Session

A Disco interactive session. It allows users to interact with the Disco command queue with various PackedFunc calling convention.

empty(shape: Sequence[int], dtype: str, device: Device | None = None, worker0_only: bool = False, in_group: bool = True) DRef

Create an empty NDArray on all workers and attach them to a DRef.

Parameters:
  • shape (tuple of int) – The shape of the NDArray.

  • dtype (str) – The data type of the NDArray.

  • device (Optional[Device] = None) – The device of the NDArray.

  • worker0_only (bool) – If False (default), allocate an array on each worker. If True, only allocate an array on worker0.

  • in_group (bool) – Take effective when worker0_only is True. If True (default), allocate an array on each first worker in each group. If False, only allocate an array on worker0 globally.

Returns:

array – The created NDArray.

Return type:

DRef

shutdown()

Shut down the Disco session

property num_workers: int

Return the number of workers in the session

get_global_func(name: str) DRef

Get a global function on workers.

Parameters:

name (str) – The name of the global function.

Returns:

func – The global packed function

Return type:

DRef

import_python_module(module_name: str) None

Import a python module in each worker

This may be required before call

Parameters:

module_name (str) – The python module name, as it would be used in a python import statement.

call_packed(func: DRef, *args) DRef

Call a PackedFunc on workers providing variadic arguments.

Parameters:
  • func (PackedFunc) – The function to be called.

  • *args (various types) – In the variadic arguments, the supported types include: - integers and floating point numbers; - DLDataType; - DLDevice; - str (std::string in C++); - DRef.

Returns:

return_value – The return value of the function call.

Return type:

various types

Notes

Examples of unsupported types: - NDArray, DLTensor,; - TVM Objects, including PackedFunc, Module and String.

sync_worker_0() None

Synchronize the controller with worker-0, and it will wait until the worker-0 finishes executing all the existing instructions.

copy_from_worker_0(host_array: NDArray, remote_array: DRef) None

Copy an NDArray from worker-0 to the controller-side NDArray.

Parameters:
  • host_array (numpy.ndarray) – The array to be copied to worker-0.

  • remote_array (NDArray) – The NDArray on worker-0.

copy_to_worker_0(host_array: NDArray, remote_array: DRef | None = None) DRef

Copy the controller-side NDArray to worker-0.

Parameters:
  • host_array (NDArray) – The array to be copied to worker-0.

  • remote_array (Optiona[DRef]) – The destination NDArray on worker-0.

Returns:

output_array – The DRef containing the copied data on worker0, and NullOpt on all other workers. If remote_array was provided, this return value is the same as remote_array. Otherwise, it is the newly allocated space.

Return type:

DRef

load_vm_module(path: str, device: Device | None = None) DModule

Load a VM module from a file.

Parameters:
  • path (str) – The path to the VM module file.

  • device (Optional[Device] = None) – The device to load the VM module to. Default to the default device of each worker.

Returns:

module – The loaded VM module.

Return type:

DModule

init_ccl(ccl: str, *device_ids)

Initialize the underlying communication collective library.

Parameters:
  • ccl (str) – The name of the communication collective library. Currently supported libraries are: - nccl - rccl - mpi

  • *device_ids (int) – The device IDs to be used by the underlying communication library.

broadcast(src: ndarray | NDArray, dst: DRef | None = None, in_group: bool = True) DRef

Broadcast an array to all workers

Parameters:
  • src (Union[np.ndarray, NDArray]) – The array to be broadcasted.

  • dst (Optional[DRef]) – The output array. If None, an array matching the shape and dtype of src will be allocated on each worker.

  • in_group (bool) – Whether the broadcast operation performs globally or in group as default.

Returns:

output_array – The DRef containing the broadcasted data on all workers. If dst was provided, this return value is the same as dst. Otherwise, it is the newly allocated space.

Return type:

DRef

broadcast_from_worker0(src: DRef, dst: DRef, in_group: bool = True) DRef

Broadcast an array from worker-0 to all other workers.

Parameters:
  • src (Union[np.ndarray, NDArray]) – The array to be broadcasted.

  • dst (Optional[DRef]) – The output array. If None, an array matching the shape and dtype of src will be allocated on each worker.

  • in_group (bool) – Whether the broadcast operation performs globally or in group as default.

scatter(src: ndarray | NDArray, dst: DRef | None = None, in_group: bool = True) DRef

Scatter an array across all workers

Parameters:
  • src (Union[np.ndarray, NDArray]) – The array to be scattered. The first dimension of this array, src.shape[0], must be equal to the number of workers.

  • dst (Optional[DRef]) – The output array. If None, an array with compatible shape and the same dtype as src will be allocated on each worker.

  • in_group (bool) – Whether the scatter operation performs globally or in group as default.

Returns:

output_array – The DRef containing the scattered data on all workers. If dst was provided, this return value is the same as dst. Otherwise, it is the newly allocated space.

Return type:

DRef

scatter_from_worker0(from_array: DRef, to_array: DRef, in_group: bool = True) None

Scatter an array from worker-0 to all other workers.

Parameters:
  • src (Union[np.ndarray, NDArray]) – The array to be scattered. The first dimension of this array, src.shape[0], must be equal to the number of workers.

  • dst (Optional[DRef]) – The output array. If None, an array with compatible shape and the same dtype as src will be allocated on each worker.

  • in_group (bool) – Whether the scatter operation performs globally or in group as default.

gather_to_worker0(from_array: DRef, to_array: DRef, in_group: bool = True) None

Gather an array from all other workers to worker-0.

Parameters:
  • from_array (DRef) – The array to be gathered from.

  • to_array (DRef) – The array to be gathered to.

  • in_group (bool) – Whether the gather operation performs globally or in group as default.

allreduce(src: DRef, dst: DRef, op: str = 'sum', in_group: bool = True) DRef

Perform an allreduce operation on an array.

Parameters:
  • array (DRef) – The array to be reduced.

  • op (str = "sum") – The reduce operation to be performed. Available options are: - “sum” - “prod” - “min” - “max” - “avg”

  • in_group (bool) – Whether the reduce operation performs globally or in group as default.

allgather(src: DRef, dst: DRef, in_group: bool = True) DRef

Perform an allgather operation on an array.

Parameters:
  • src (DRef) – The array to be gathered from.

  • dst (DRef) – The array to be gathered to.

  • in_group (bool) – Whether the reduce operation performs globally or in group as default.

class tvm.runtime.disco.ThreadedSession(num_workers: int, num_groups: int = 1)

A Disco session backed by multi-threading.

class tvm.runtime.disco.SocketSession(num_nodes: int, num_workers_per_node: int, num_groups: int, host: str, port: int)

A Disco session backed by socket-based multi-node communication.