Numpy and Torch interop#
Quadrants provides interop with both numpy and PyTorch. There are three mechanisms:
Copy-based: convert data between quadrants fields/ndarrays and numpy arrays or torch tensors
Zero-copy via DLPack: obtain a torch tensor or numpy array that aliases the underlying Quadrants memory
Direct pass-through: pass torch tensors directly into kernels as ndarray arguments (zero-copy)
Copy-based interop#
Fields#
Fields support to_numpy(), from_numpy(), to_torch(), and from_torch():
import numpy as np
import quadrants as qd
qd.init(arch=qd.gpu)
f = qd.field(qd.f32, shape=(4, 4))
# numpy -> field
arr = np.ones((4, 4), dtype=np.float32) * 3.0
f.from_numpy(arr)
# field -> numpy
result = f.to_numpy()
print(result[0, 0]) # 3.0
With torch:
import torch
f = qd.field(qd.f32, shape=(4, 4))
# torch -> field
t = torch.ones(4, 4, dtype=torch.float32) * 5.0
f.from_torch(t)
# field -> torch
result = f.to_torch(device="cpu")
print(result[0, 0]) # 5.0
Ndarrays#
Ndarrays support to_numpy() and from_numpy():
a = qd.ndarray(qd.i32, shape=(10,))
# numpy -> ndarray
arr = np.arange(10, dtype=np.int32)
a.from_numpy(arr)
# ndarray -> numpy
result = a.to_numpy()
Shape requirements#
The shape of the numpy array or torch tensor must match the shape of the field or ndarray exactly. For matrix and vector fields, the element dimensions are appended to the shape:
# A field of 3x2 matrices with shape (4,) has numpy shape (4, 3, 2)
m = qd.Matrix.field(3, 2, qd.f32, shape=(4,))
arr = np.zeros((4, 3, 2), dtype=np.float32)
m.from_numpy(arr)
Zero-copy interop via DLPack#
Quadrants’ zero-copy interop has been designed with PyTorch as the first-class user interface: support, defaults, and supported-dtype/backend matrices are driven by what PyTorch can consume cleanly via DLPack. NumPy is supported on CPU backends as a free side benefit of the same DLPack capsule. Several of the limitations below (e.g. the Apple Metal torch >= 2.9.2 requirement, or the 0-dim ScalarField carve-out) are inherited from PyTorch’s current DLPack importer rather than from Quadrants itself.
to_torch() and to_numpy() accept a keyword-only copy argument that controls whether the returned tensor/array is an independent copy of the data or a zero-copy view that aliases the underlying Quadrants memory.
f = qd.field(qd.f32, shape=(1024,))
view = f.to_torch(copy=False) # zero-copy view: aliases f's memory, or ValueError
auto = f.to_torch(copy=None) # zero-copy if possible, otherwise copy
clone = f.to_torch(copy=True) # independent copy (default)
plain = f.to_torch() # same as copy=True
When using copy=False or copy=None (when zero-copy succeeds), modifications via the view are visible to subsequent Quadrants kernel reads, and vice-versa. The view stays valid until the underlying storage is reallocated – typically on qd.init() or qd.reset(), after which a fresh call to to_torch(copy=False) / to_numpy(copy=False) returns a new view.
When zero-copy is available#
Zero-copy uses DLPack and requires:
a backend with DLPack support:
cpu(x64/arm64),cuda,amdgpu, ormetal. Vulkan is not supported: Vulkan-backed DLPack tensors are not processed by many well-known scientific computing libraries (torch, numpy, etc.);a DLPack-supported dtype:
i32,i64,f32,f64,u1(other dtypes such asf16,u8,u16fall back to the kernel-copy path);on Apple Metal,
torch >= 2.9.2for fields (required for DLPackbytes_offseton MPS; see pytorch/pytorch#168193);0-dim
ScalarFieldinstances are not zero-copyable on any backend (PyTorch DLPackbytes_offsetlimitation);members of an AOS
StructField(the defaultStruct.field(..., layout=Layout.AOS)) are not zero-copyable yet (see Struct fields below); members of an SOAStructField(layout=Layout.SOA) are zero-copyable individually.
Zero-copy to_numpy() additionally requires a CPU backend, because numpy arrays cannot reference GPU memory. Note: Field.to_numpy(copy=False) and MatrixField.to_numpy(copy=False) currently require torch to be installed, because the C++ field_to_dlpack checks the torch version internally. Ndarray.to_numpy(copy=False) does not require torch.
On NumPy >= 2.1, to_numpy(copy=False) returns a writable array (via a DLPack v1 capsule). On NumPy 1.26–2.0, the returned array is read-only because those versions only consume DLPack v0 capsules, which lack writability metadata. If you need writable zero-copy numpy views, upgrade to NumPy >= 2.1.
Semantics of copy#
Value |
Behaviour |
|---|---|
|
Independent copy via kernel. |
|
Zero-copy view via DLPack when available, otherwise falls back to a copy silently. |
|
Zero-copy view via DLPack, or |
The default copy=True always returns a buffer that is safe to mutate without affecting the field/ndarray. Use copy=None when you want zero-copy as a best-effort optimisation without having to handle exceptions — it gives you a view when possible and a safe copy otherwise.
Examples#
import quadrants as qd
qd.init(arch=qd.cuda)
f = qd.field(qd.f32, shape=(1024,))
f.fill(1.0)
view = f.to_torch(copy=False)
view *= 2.0 # mutates f's underlying memory directly
qd.sync() # not strictly required; safe pattern
print(f[0]) # 2.0
Round-trip with NumPy on a CPU backend:
qd.init(arch=qd.cpu)
a = qd.ndarray(qd.i32, shape=(8,))
a.from_numpy(np.arange(8, dtype=np.int32))
view = a.to_numpy(copy=False)
view[0] = 100 # Requires NumPy >= 2.1 for the assignment to succeed
print(a.to_numpy()[0]) # 100
Caching#
Each call to to_torch(copy=False) builds a fresh DLPack capsule, but the returned tensor aliases the same underlying memory. Downstream frameworks (e.g. Genesis) may cache the returned view on the field object for hot-path reuse; Quadrants itself does not cache views internally.
v1 = f.to_torch(copy=False)
v2 = f.to_torch(copy=False)
assert v1.data_ptr() == v2.data_ptr() # same underlying memory
Apple Metal: synchronisation#
On Apple Metal, Quadrants and PyTorch MPS use separate Metal command queues. Every to_torch() / to_numpy() call runs qd.sync() internally to flush the Quadrants queue. Additionally, copy=True (the default) calls torch.mps.synchronize() after the kernel copy. This is necessary because, on Metal, Quadrants and Torch do not share the same compute streams. copy=False does not call torch.mps.synchronize():
qd.init(arch=qd.metal)
f = qd.field(qd.f32, shape=(64,))
run_kernel(f) # queues writes on the Quadrants Metal stream
view = f.to_torch(copy=False) # qd.sync() only
copy = f.to_torch(copy=True) # qd.sync() + torch.mps.synchronize()
The reverse direction (PyTorch writes to a zero-copy view, then a Quadrants kernel reads from the same field) is not automatically synchronised. Because Quadrants and PyTorch MPS submit work to separate Metal command queues, a kernel launched immediately after a torch write may execute before the torch write has actually committed to memory:
qd.init(arch=qd.metal)
f = qd.field(qd.f32, shape=(64,))
view = f.to_torch(copy=False)
view.zero_() # queued on the torch MPS stream
my_kernel(f) # may run BEFORE view.zero_() commits!
torch.mps.synchronize() # required to flush the torch MPS stream first
my_kernel(f) # now safe
This is intentional: forcing a sync on every Quadrants kernel that touches a previously-zerocopied field would be very expensive in workloads that batch many torch ops and many kernels back-to-back. If you mutate fields from torch and then read them from a Quadrants kernel on Metal, call torch.mps.synchronize() once between the torch ops and the kernels.
Shared command queue. The synchronisation overhead above can be eliminated entirely by passing PyTorch MPS’s MTLCommandQueue to Quadrants at init time via external_metal_command_queue. Quadrants provides quadrants.interop.get_mps_command_queue() to extract the queue pointer at runtime. When both frameworks share the same queue, Metal guarantees command buffer ordering automatically. See Shared Metal command queue for the setup guide.
Lifetime caveats#
A zero-copy view becomes invalid when the underlying Quadrants storage is freed. This happens on qd.reset() and qd.init(). Holding a copy=False tensor across either is undefined behaviour:
view = f.to_torch(copy=False)
qd.reset()
view[0] # undefined: view aliases freed memory
The default copy=True produces an independent copy that is unaffected. Only copy=False views are affected by this caveat.
Struct fields#
StructField.to_torch() and StructField.to_numpy() return a dictionary mapping each member name to a tensor / array; the copy argument is propagated to each member, so zero-copy availability is decided per member. The relevant axis is the SNode layout chosen at construction:
AOS (default
Struct.field(..., layout=Layout.AOS)): all members share the struct cell, e.g.Struct.field({"a": i32, "b": f32}, shape=(N,))stores[a0, b0, a1, b1, ...]in memory, with stridesizeof(cell)between consecutivea’s. Quadrants’ C++ DLPack export does not currently emit cell-stride-aware views for individual members (it computes contiguous strides at the member dtype size, which would interleave neighbouring members’ bytes), so AOS members fall back to a kernel copy andcopy=Falseraises on each AOS member.SOA (
Struct.field(..., layout=Layout.SOA)): each member sits in its own dense SNode subtree with contiguous storage, so members are zero-copyable individually under the usual backend / dtype rules.copy=Falsesucceeds and returns aliasing views.
S_aos = qd.Struct.field({"pos": qd.f32, "vel": qd.f32}, shape=(16,)) # AOS (default)
d_aos = S_aos.to_torch() # dict of kernel copies
d_aos["pos"][0] = 1.0 # does NOT write back
S_soa = qd.Struct.field({"pos": qd.f32, "vel": qd.f32}, shape=(16,),
layout=qd.Layout.SOA)
d_soa = S_soa.to_torch(copy=False) # dict of zero-copy views
d_soa["pos"][0] = 1.0 # writes through to S_soa.pos
Raw DLPack export with to_dlpack()#
All field and ndarray types expose a to_dlpack() method that returns a raw DLPack PyCapsule. This is the low-level primitive that to_torch(copy=False) and to_numpy(copy=False) are built on; use it when you need to feed Quadrants data into a framework that speaks DLPack directly (e.g. JAX, CuPy, or a custom C extension).
qd.init(arch=qd.cpu)
f = qd.field(qd.f32, shape=(8,))
f.fill(1.0)
capsule = f.to_dlpack() # v0 capsule ("dltensor")
t = torch.utils.dlpack.from_dlpack(capsule) # zero-copy torch tensor
For NumPy, prefer to_numpy(copy=False) which handles the DLPack protocol adapter internally. If you need a raw v1 capsule for another consumer, use f.to_dlpack(versioned=True). Note that np.from_dlpack does not accept raw PyCapsule objects — it requires an object exposing __dlpack__() and __dlpack_device__(), which to_numpy(copy=False) provides via an internal adapter.
The versioned parameter selects the DLPack protocol version:
|
Capsule type |
Capsule name |
Use case |
|---|---|---|---|
|
|
|
|
|
|
|
|
The same backend, dtype, and layout restrictions that apply to to_torch(copy=False) / to_numpy(copy=False) apply here — to_dlpack() is the underlying mechanism. The caller is responsible for calling qd.sync() between modifying the field and consuming the capsule.
Direct torch tensor pass-through#
Torch tensors can be passed directly into kernels where qd.types.ndarray() parameters are expected. The kernel reads from and writes to the torch tensor directly:
import torch
import quadrants as qd
qd.init(arch=qd.gpu)
@qd.kernel
def square(inp: qd.types.ndarray(), out: qd.types.ndarray()) -> None:
for i in range(32):
out[i] = inp[i] * inp[i]
x = torch.ones(32, dtype=torch.float32) * 3.0
y = torch.zeros(32, dtype=torch.float32)
square(x, y)
print(y[0]) # 9.0
This also works with CUDA tensors when running on a CUDA backend:
x = torch.ones(32, dtype=torch.float32, device="cuda:0") * 3.0
y = torch.zeros(32, dtype=torch.float32, device="cuda:0")
square(x, y)
Integration with torch.autograd#
Since torch tensors can be passed directly into kernels, you can integrate Quadrants kernels into PyTorch’s autograd system by wrapping them in a torch.autograd.Function:
@qd.kernel
def forward_kernel(t: qd.types.ndarray(), o: qd.types.ndarray()) -> None:
for i in range(32):
o[i] = t[i] * t[i]
@qd.kernel
def backward_kernel(t_grad: qd.types.ndarray(), t: qd.types.ndarray(), o_grad: qd.types.ndarray()) -> None:
for i in range(32):
t_grad[i] = 2 * t[i] * o_grad[i]
class Sqr(torch.autograd.Function):
@staticmethod
def forward(ctx, inp):
outp = torch.zeros_like(inp)
ctx.save_for_backward(inp)
forward_kernel(inp, outp)
return outp
@staticmethod
def backward(ctx, outp_grad):
outp_grad = outp_grad.contiguous()
inp_grad = torch.zeros_like(outp_grad)
(inp,) = ctx.saved_tensors
backward_kernel(inp_grad, inp, outp_grad)
return inp_grad
x = torch.tensor([2.0] * 32, requires_grad=True)
loss = Sqr.apply(x).sum()
loss.backward()
print(x.grad[0]) # 4.0
Summary#
Method |
Copies data? |
Works with fields? |
Works with ndarrays? |
|---|---|---|---|
|
yes |
yes |
yes |
|
yes |
yes |
yes |
|
no when possible, yes otherwise |
yes |
yes |
|
no (DLPack view) |
yes |
yes |
|
no (raw capsule) |
yes |
yes |
Direct pass-through |
no |
no |
yes (as kernel arg) |
The copy parameter is supported on to_numpy() and to_torch() for ScalarField, MatrixField (and VectorField), StructField, qd.Tensor, and all Ndarray types. See Zero-copy interop via DLPack for the support matrix and lifetime rules.