User guide# Getting started Getting started Supported systems Core concepts Tensor types Scalar tensors Matrix and Vector Advanced Per-thread matrix and vector operations Per-thread linear algebra Tensors Compound types BufferView qd.static Sub-functions Parallelization Integration Numpy and Torch interop Shared Metal command queue (PyTorch MPS) Autodiff Automatic differentiation SIMT primitives Atomics Block primitives Grid primitives Math Subgroup primitives Register-resident tiles: Tile16x16 and Tile32x32 Algorithms Algorithms Performance Fastcache Graph Streams Performance Dispatch qd.init options Testing Unit testing Kernel code coverage Reference Python features in kernel scope Python backend Debug mode Quirks Troubleshooting Advanced Optimization passes Contributing Advanced: Contributing to quadrants Internal Building the CUDA graph conditional fatbin