Shared Metal command queue (PyTorch MPS)#
On Apple Silicon, Quadrants and PyTorch MPS both dispatch GPU work via Metal. By default each framework creates its own MTLCommandQueue, which means there is no GPU-level ordering between them. Every zero-copy interop point therefore requires explicit CPU-side synchronisation (qd.sync() and torch.mps.synchronize()) to guarantee data visibility.
The external_metal_command_queue option lets you pass PyTorch’s command queue to Quadrants so that both frameworks share a single queue. Metal processes command buffers in commit order within a queue, so GPU-side ordering is automatic and the per-interop sync overhead is eliminated.
Quick start#
import quadrants as qd
from quadrants.interop import get_mps_command_queue
queue_ptr = get_mps_command_queue()
qd.init(
arch=qd.metal,
external_metal_command_queue=queue_ptr,
external_metal_command_queue_is_torch_queue=True,
)
Two flags work together:
external_metal_command_queue— the rawMTLCommandQueue*pointer. Quadrants dispatches all GPU work on this queue instead of creating its own.external_metal_command_queue_is_torch_queue— set toTruewhen the queue comes from PyTorch MPS. This tells Quadrants that PyTorch shares the same queue, so the explicit interop syncs can be safely skipped. Defaults toFalse, which preserves the sync calls even when an external queue is provided (useful when the external queue belongs to a non-PyTorch framework).
Once initialised with both flags:
to_torch(copy=False)no longer callsqd.sync()internally.to_torch(copy=True)no longer callstorch.mps.synchronize()after the copy.GPU work submitted by Quadrants and by PyTorch executes in the order it was committed — no manual sync needed between the two.
You can still call qd.sync() when you need to read results back to the CPU (e.g. to_numpy()); what changes is that you no longer need both qd.sync() and torch.mps.synchronize() at every framework boundary.
Extracting PyTorch’s MTLCommandQueue#
PyTorch does not expose its MPS command queue through a public Python API. Quadrants provides a built-in helper that extracts it at runtime using ctypes and the Objective-C runtime:
from quadrants.interop import get_mps_command_queue
queue_ptr = get_mps_command_queue() # returns int (raw pointer), or 0 on failure
The function initialises PyTorch MPS if needed, then returns the MTLCommandQueue* as a Python int. It returns 0 if extraction fails (e.g. non-macOS platform, PyTorch not installed, MPS not available, or unsupported PyTorch build). The underlying C++ symbol (_ZN2at3mps19getDefaultMPSStreamEv) has been stable since PyTorch 1.13.
Init ordering#
get_mps_command_queue() handles PyTorch MPS initialisation internally, so you can call it before qd.init() without any manual setup:
import quadrants as qd
from quadrants.interop import get_mps_command_queue
queue_ptr = get_mps_command_queue() # initialises MPS if needed
qd.init(
arch=qd.metal,
external_metal_command_queue=queue_ptr,
external_metal_command_queue_is_torch_queue=True,
)
What changes with a shared queue#
Scenario |
Separate queues (default) |
Shared queue |
|---|---|---|
|
|
no sync needed |
|
|
no sync needed |
Quadrants kernel after torch write |
manual |
automatic (same queue) |
|
|
|
Lifetime and ownership#
The caller (your application) owns the command queue. Quadrants borrows the pointer without retaining it, so the caller must keep the queue alive for the lifetime of the Quadrants runtime. In practice this means keeping PyTorch (and its MPS backend) alive for as long as qd.init() is active.
Fallback#
get_mps_command_queue() returns 0 on failure (non-macOS, missing PyTorch, unsupported build) rather than raising. You can use this to fall back to the default separate-queue path:
from quadrants.interop import get_mps_command_queue
queue_ptr = get_mps_command_queue()
qd.init(
arch=qd.metal,
external_metal_command_queue=queue_ptr or None,
external_metal_command_queue_is_torch_queue=queue_ptr != 0,
)
When external_metal_command_queue is 0 (or omitted), Quadrants creates its own queue and the explicit sync path is used as before.