Constructor
importObject can also be a LibraryProvider object, a WASI object, or an object containing wasmLibraryProvider field.
The input module or instance.
The imports to initialize the wasmInstance if it is not provided.
OptionalwasmInstance: InstanceAdditional wasm instance argument for deferred construction.
Optionalenv: EnvironmentDirectly specified environment module.
Please use the async version instantiate when targeting browsers.
Apply presence and frequency penalty. This is an inplace operation.
The input logits before penalty.
The appeared token ids.
The number of times each token has appeared since last PrefillStep. token_freqs[i] is the frequency of token_ids[i], for all i. And all token_freqs should be >= 1.
The penalty factor.
The penalty factor.
Apply softmax with temperature to the logits.
The input logits before softmax w/ temperature.
The temperature factor.
Check whether we enabled asyncify mode
The asynctify mode toggle
Asynchronously load webgpu pipelines when possible.
The input module.
Attach a detached obj to the auto-release pool of the current scope.
The input obj.
Begin a new scope for tracking object disposal.
Benchmark stable execution of the run function.
The run function. *
The device to sync during each run. *
The number of times to compute the average. *
The number of times to repeat the run.
Bind canvas to the current WebGPU context
The canvas.
Clear canvas
Setup a virtual machine module with given device.
DLDevice the device.
The created virtual machime.
Detach the object from the current scope so it won't be released via auto-release during endscope.
User needs to either explicitly call obj.dispose(), or attachToCurrentScope to re-attach to the current scope.
This function can be used to return values to the parent scope.
The object.
Dispose the internal resource This function can be called multiple times, only the first call will take effect.
Create an empty Tensor with given shape and dtype.
The shape of the array.
The data type of the array.
The device of the ndarray.
The created ndarray.
End a scope and release all created TVM objects under the current scope.
Exception: one can call moveToParentScope to move a value to parent scope.
Given cacheUrl, search up items to fetch based on cacheUrl/tensor-cache.json
The cache url.
The device to be fetched to.
The scope identifier of the cache
The type of the cache: "cache" or "indexedDB"
Optionalsignal: AbortSignalAn optional AbortSignal to abort the fetch
The meta data
Get global PackedFunc from the runtime.
The name of the function.
The result function.
Get parameters in the form of prefix_i
The parameter prefix.
Number of parameters.
Get parameters based on parameter names provided
Names of the parameters.
Parameters read.
Initialize webgpu in the runtime.
The given GPU device.
Check if func is PackedFunc.
The input.
The check result.
List all the global function names registered in the runtime.
The name list.
Create a shape tuple to pass to runtime.
The shape .
The created shape tuple.
Move obj's attachment to the parent scope.
This function is useful to make sure objects are still alive when exit the current scope.
The object to be moved.
The input obj.
Register async function as asynctify callable in global environment.
The name of the function.
function to be registered.
Whether overwrite function in existing registry.
Register an asyncfunction to be global function in the server.
The name of the function.
function to be registered.
Whether overwrite function in existing registry.
Register function to be global function in tvm runtime.
The name of the function. *
Function to be registered.
Whether overwrite function in existing registry.
Register a call back for fetch progress.
the fetch progress callback.
Register an object constructor.
The name of the function.
Function to be registered.
Whether overwrite function in existing registry.
Obtain the runtime information in readable format.
Sample index via top-p sampling.
The input logits before normalization.
The temperature factor, will take argmax if temperature = 0.0
The top_p
The sampled index.
Sample index via top-p sampling.
The distribution, i.e. logits after applySoftmaxWithTemperature() is performed.
The top_p
The sampled index.
Set packed function arguments into the location indicated by argsValue and argsCode. Allocate new temporary space from the stack if necessary.
The call stack.
The input arguments.
The offset of packedArgs.
Set the seed of the internal LinearCongruentialGenerator.
Show image in canvas.
Image array in height x width uint32 Tensor RGBA format on GPU.
Get system-wide library module in the wasm. System lib is a global module that contains self register functions in startup.
The system library module.
Clear the tensor cache.
Update the tensor cache.
The name of the array.
The content.
Convert func to PackedFunc
Input function.
The converted function.
Get type index from type key.
The type key.
The corresponding type index.
Perform action under a new scope.
The action function.
The result value.
Wrap a function obtained from tvm runtime as AsyncPackedFunc through the asyncify mechanism
You only need to call it if the function may contain callback into async JS function via asynctify. A common one can be GPU synchronize.
It is always safe to wrap any function as Asynctify, however you do need to make sure you use await when calling the funciton.
The PackedFunc.
The wrapped AsyncPackedFunc
TVM runtime instance.
All objects(Tensor, Module, PackedFunc) returned by TVM runtim function call and PackedFunc instance are tracked through a scope mechanism that will get auto-released when we call EndScope.
This is necessarily to be able to release the underlying WASM and WebGPU memory that are not tracked through JS native garbage collection mechanism.
This does mean that we have to get familar with the following functions: