Fastcache#

What it is#

Fastcache reduces the time it takes to load cached kernels when a new Python process starts.

The standard offline cache already persists compiled kernels to disk. However, loading a cached kernel still requires parsing the kernel’s Python AST and transforming it into IR. For applications with many kernels this front-end overhead alone can take several seconds.

Fastcache bypasses that front-end work. It computes a cheap cache key from the kernel source text, argument types, and compiler config, and uses it to load the compiled artifact directly.

On a Genesis simulator benchmark (single_franka_envs.py, Ubuntu 24.04, NVIDIA 5090):

Configuration	Process-start kernel load time
Other caches (no fastcache)	4.6 s
+ fastcache	0.3 s

How to use it#

Enabling fastcache on a kernel#

Fastcache requires the kernel to be pure: all data it operates on must be passed as explicit parameters, with nothing captured from the enclosing Python scope (see Constraints below). Because not all kernels satisfy this, fastcache is opt-in — you assert purity by adding fastcache=True to the @qd.kernel decorator:

import quadrants as qd

@qd.kernel(fastcache=True)
def my_kernel(a: qd.types.NDArray[qd.f32, 1], b: qd.types.NDArray[qd.f32, 1]) -> None:
    for i in range(a.shape[0]):
        b[i] = a[i] * 2.0

That’s it. On the first call, the kernel compiles normally and the fastcache entry is written to disk. When the next Python process starts, the cached artifact is loaded directly.

Runtime configuration#

Fastcache requires the offline cache to be enabled (which it is by default). Two qd.init options are relevant:

Option	Default	Effect
`src_ll_cache`	`True`	Master switch for fastcache. Set to `False` to disable it globally.
`print_non_pure`	`False`	When `True`, prints the name of every kernel at call time that is not marked `fastcache=True`. Useful for finding kernels you forgot to annotate.

qd.init(arch=qd.gpu)
# Fastcache is on by default. To disable:
# qd.init(arch=qd.gpu, src_ll_cache=False)

# To find un-annotated kernels:
# qd.init(arch=qd.gpu, print_non_pure=True)

Dataclass fields with cached values#

By default, for dataclasses.dataclass parameters, fastcache only includes the types of each field in the cache key, not their values. This is fine for fields like ndarrays, where the compiled kernel doesn’t depend on the actual data, only the dtype and dimensionality.

However, some dataclass fields hold configuration values that get baked into the compiled kernel — typically values used with qd.static(), such as loop bounds or feature flags:

for i in qd.static(range(config.num_layers)):
    ...

Here the value of num_layers is compiled into the kernel. Concretely the loop will be unrolled, at compile time. If num_layers changes, a different kernel must be compiled.

Mark such fields with add_value_to_cache_key so their values are included in the cache key:

import dataclasses
from quadrants.lang._fast_caching import FIELD_METADATA_CACHE_VALUE

@dataclasses.dataclass
class SimConfig:
    num_envs: int = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
    dt: float = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
    use_gravity: bool = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})

With this annotation, changing num_envs from 100 to 200 produces a different cache key so the correct compiled kernel is looked up (or compiled if not yet cached). Without it, the wrong kernel could be loaded.

Note: @qd.data_oriented objects and qd.Template parameters already include primitive values in the cache key automatically — this annotation is only needed for dataclasses.dataclass fields.

Constraints#

A kernel is eligible for fastcache only if all of the following hold:

1. All data flows through parameters#

The kernel must receive every piece of data it operates on as an explicit parameter. It must not capture variables from the enclosing Python scope (closures over fields, ndarrays, or mutable globals). This is the core “purity” constraint — the compiled kernel’s behavior must be fully determined by its arguments.

a = qd.ndarray(qd.f32, (10,))

# Not eligible: captures `a` from enclosing scope
@qd.kernel(fastcache=True)
def bad_kernel() -> None:
    for i in range(10):
        a[i] = 0.0  # raises QuadrantsCompilationError

# Eligible: `a` is passed as a parameter
@qd.kernel(fastcache=True)
def good_kernel(a: qd.types.NDArray[qd.f32, 1]) -> None:
    for i in range(a.shape[0]):
        a[i] = 0.0

Sub-functions called by the kernel are also checked — they must not capture external state either.

Exemptions: The following may be accessed from the enclosing scope without violating purity:

Allowed capture	Why
`enum.Enum` values (e.g. `MyEnum.VALUE`)	Named constants that are assumed not to vary between process runs.
`math` / `numpy` constants (e.g. `math.pi`)	Assumed stable across process runs.
Quadrants module attributes (e.g. `qd.simt.Tile16x16.SIZE`)	Part of the compiler’s own API; assumed consistent with the Quadrants version hash.

Other named constants (non-enum, non-module) captured from scope will raise a QuadrantsCompilationError, except for UPPERCASE names which emit a warning instead.

2. Supported parameter types#

Fastcache supports the following parameter types:

Type	Supported	Cache key includes
`qd.types.NDArray` (scalar, vector, matrix)	Yes	dtype, ndim, layout
`torch.Tensor`	Yes	dtype, ndim
`numpy.ndarray`	Yes	dtype, ndim
`dataclasses.dataclass`	Yes	field types recursively; field values if annotated with `add_value_to_cache_key` (see above)
`@qd.data_oriented` objects	Yes	member types and primitive member values recursively
`qd.Template` primitives (int, float, bool)	Yes	type and value (baked into kernel)
Non-template primitives (int, float, bool)	Yes	type only
`enum.Enum`	Yes	name and value
`qd.field` / `ScalarField` / `MatrixField`	No	—

If any parameter is of an unsupported type, fastcache is disabled for that call and the kernel falls back to normal compilation. For qd.field / ScalarField / MatrixField arriving through a qd.Tensor-annotated parameter, this is silent — no warning is emitted. For other unsupported types, a warning is logged at the warn level identifying the offending parameter.

3. Source code must be available#

Fastcache hashes the source code of the kernel and all sub-functions it calls. If the source file cannot be read at runtime (e.g. the kernel is defined in a frozen/compiled module, or the file has been deleted), fastcache cannot validate the cache and will fall back to normal compilation.

Cache keying#

Each compiled artifact is stored under a key derived from all of the following:

The Quadrants version (quadrants.__version__).
The source code of the kernel function or any @qd.func it calls.
The argument types (e.g. switching from f32 to f64, or changing ndarray dimensionality).
The compiler configuration (e.g. arch, debug, opt_level, fast_math).
Template parameter values (since they are baked into the compiled kernel).

When any of these change, the resulting key is different, so a new compilation occurs and a new entry is stored. Previous entries remain on disk — multiple cached versions coexist. You do not need to manually clear the cache when making code changes — the hash mismatch causes a transparent recompilation.

Advanced#

Diagnostics#

You can inspect whether fastcache was used for a specific kernel via the src_ll_cache_observations attribute on the kernel’s primal:

@qd.kernel(fastcache=True)
def my_kernel(x: qd.types.NDArray[qd.f32, 1]) -> None:
    for i in range(x.shape[0]):
        x[i] += 1.0

my_kernel(some_array)

obs = my_kernel._primal.src_ll_cache_observations
print(obs.cache_key_generated)  # True if the cache key was computed
print(obs.cache_validated)      # True if a cached entry was found and source hashes matched
print(obs.cache_loaded)         # True if the compiled kernel was loaded from cache
print(obs.cache_stored)         # True if the compiled kernel was stored to cache

On the first run you’ll see cache_stored=True but cache_loaded=False. On the second run (after qd.init), cache_loaded=True.