User guide# Getting started Getting started Supported systems Core concepts Tensor types Scalar tensors Matrix and Vector Per-thread matrix and vector operations Per-thread linear algebra Tensors Compound types BufferView qd.static Sub-functions Parallelization Integration Numpy and Torch interop Shared Metal command queue (PyTorch MPS) Autodiff Automatic differentiation SIMT primitives Atomics Block primitives Grid primitives Math Subgroup primitives Tile16x16: register-resident 16x16 tiles Algorithms Algorithms Performance Fastcache Graph Streams Performance Dispatch qd.init options Testing Kernel code coverage Reference Python features in kernel scope Python backend Debug mode Quirks Troubleshooting Contributing Contributing to quadrants Internal Building the CUDA graph conditional fatbin