CacheState manages domain-specific caches for the WebGPU runtime.
Currently contains:
shapeCache: Caches TVM ShapeTuple objects keyed by dimension string.
Why: makeShapeTuple() is called on every tensor operation, crossing
the JS→WASM FFI boundary each time. During LLM decode, the same shapes
repeat every token (e.g. [1,32,128]), so caching avoids thousands of
redundant FFI round-trips.
Invalidation: Never. Shape tuples are immutable value objects that
remain valid for the lifetime of the TVM instance.
Future additions (follow-up PR):
uniformCache: Caches GPU uniform buffers keyed by content hash.
Why: Many dispatches use identical scalar arguments (matrix dims, etc.).
Reusing the buffer avoids createBuffer + writeBuffer overhead.
Invalidation: Must invalidate on any GPU buffer deallocation, since
buffer pointers can be reused by the allocator, making cached entries
that reference the old buffer stale.
CacheState manages domain-specific caches for the WebGPU runtime.
Currently contains:
makeShapeTuple()is called on every tensor operation, crossing the JS→WASM FFI boundary each time. During LLM decode, the same shapes repeat every token (e.g. [1,32,128]), so caching avoids thousands of redundant FFI round-trips.Future additions (follow-up PR):
createBuffer+writeBufferoverhead.