# Per-thread matrix and vector operations Element-wise arithmetic and closed-form helpers on `qd.Matrix` / `qd.Vector` — every op below runs **per thread, in registers, with no cross-thread cooperation**. A kernel that calls these on a million-element field runs a million independent copies in parallel; there is no shared memory, no sync, no warp / subgroup primitive involved. For the data type itself (declarations, fields, ndarrays, type annotations) see [matrix_vector](matrix_vector.md). For per-thread numerical algorithms (`qd.svd`, `qd.sym_eig`, `qd.solve`, etc. — the iterative / pivoting cousins of the closed-form ops below) see [linalg_per_thread](linalg_per_thread.md). ## Arithmetic Standard arithmetic works element-wise: ```python @qd.func def example() -> None: a = qd.Vector([1.0, 2.0, 3.0]) b = qd.Vector([4.0, 5.0, 6.0]) c = a + b # [5.0, 7.0, 9.0] d = a * 2.0 # [2.0, 4.0, 6.0] e = a * b # element-wise: [4.0, 10.0, 18.0] ``` ## Dot and cross product ```python @qd.func def products() -> None: a = qd.Vector([1.0, 0.0, 0.0]) b = qd.Vector([0.0, 1.0, 0.0]) d = a.dot(b) # 0.0 c = a.cross(b) # [0.0, 0.0, 1.0] ``` `cross` works for 2D vectors (returns a scalar) and 3D vectors (returns a vector). ## Norm and normalize ```python @qd.func def norms() -> None: v = qd.Vector([3.0, 4.0]) length = v.norm() # 5.0 length_sq = v.norm_sqr() # 25.0 unit = v.normalized() # [0.6, 0.8] inv_len = v.norm_inv() # 0.2 ``` Pass an `eps` argument for numerical safety: `v.normalized(eps=1e-8)`. ## Matrix operations ```python @qd.func def mat_ops() -> None: m = qd.Matrix([[1.0, 2.0], [3.0, 4.0]]) t = m.transpose() # [[1, 3], [2, 4]] d = m.determinant() # -2.0 tr = m.trace() # 5.0 inv = m.inverse() # [[-2, 1], [1.5, -0.5]] ``` - `determinant()` supports matrices up to 4×4 (closed-form expansion). - `inverse()` supports matrices up to 12×12. Sizes 1–4 use the closed-form cofactor expansion; sizes 5–12 use Gauss elimination with partial pivoting (fully unrolled). For the larger sizes, achievable precision scales as `cond(A) · eps` — a well-conditioned 12×12 in `f64` typically reconstructs to ~1e-12. ## Frobenius inner product and norm ```python @qd.func def inner_products() -> None: a = qd.Matrix([[1.0, 2.0], [3.0, 4.0]]) b = qd.Matrix([[0.0, 1.0], [1.0, 0.0]]) s = a.frobenius_inner(b) # 2 + 3 = 5.0 (sum_ij a[i,j] * b[i,j]) n = a.norm() # sqrt(1 + 4 + 9 + 16) = sqrt(30) n_sq = a.norm_sqr() # 30.0 (== a.frobenius_inner(a)) ``` `frobenius_inner(other)` requires both matrices to have the same shape and supports any size. It's the natural inner product on matrices viewed as vectors of length `n × m` and is the correct bilinear form behind `norm` / `norm_sqr` (`A.frobenius_inner(A) == A.norm_sqr()`). ## Matrix-vector multiply Use the `@` operator: ```python @qd.func def mat_vec() -> None: m = qd.Matrix([[1.0, 0.0], [0.0, 2.0]]) v = qd.Vector([3.0, 4.0]) result = m @ v # [3.0, 8.0] ``` ## Other operations - `qd.Matrix.diag(dim, val)` — create a diagonal matrix - `a.outer_product(b)` — outer product of two vectors