Shared Metal command queue (PyTorch MPS)#

On Apple Silicon, Quadrants and PyTorch MPS both dispatch GPU work via Metal. By default each framework creates its own MTLCommandQueue, which means there is no GPU-level ordering between them. Every zero-copy interop point therefore requires explicit CPU-side synchronisation (qd.sync() and torch.mps.synchronize()) to guarantee data visibility.

The external_metal_command_queue option lets you pass PyTorch’s command queue to Quadrants so that both frameworks share a single queue. Metal processes command buffers in commit order within a queue, so GPU-side ordering is automatic and the per-interop sync overhead is eliminated.

Quick start#

import quadrants as qd
from quadrants.interop import get_mps_command_queue

queue_ptr = get_mps_command_queue()
qd.init(
    arch=qd.metal,
    external_metal_command_queue=queue_ptr,
    external_metal_command_queue_is_torch_queue=True,
)

Two flags work together:

  • external_metal_command_queue — the raw MTLCommandQueue* pointer. Quadrants dispatches all GPU work on this queue instead of creating its own.

  • external_metal_command_queue_is_torch_queue — set to True when the queue comes from PyTorch MPS. This tells Quadrants that PyTorch shares the same queue, so the explicit interop syncs can be safely skipped. Defaults to False, which preserves the sync calls even when an external queue is provided (useful when the external queue belongs to a non-PyTorch framework).

Once initialised with both flags:

  • to_torch(copy=False) no longer calls qd.sync() internally.

  • to_torch(copy=True) no longer calls torch.mps.synchronize() after the copy.

  • GPU work submitted by Quadrants and by PyTorch executes in the order it was committed — no manual sync needed between the two.

You can still call qd.sync() when you need to read results back to the CPU (e.g. to_numpy()); what changes is that you no longer need both qd.sync() and torch.mps.synchronize() at every framework boundary.

Extracting PyTorch’s MTLCommandQueue#

PyTorch does not expose its MPS command queue through a public Python API. Quadrants provides a built-in helper that extracts it at runtime using ctypes and the Objective-C runtime:

from quadrants.interop import get_mps_command_queue

queue_ptr = get_mps_command_queue()  # returns int (raw pointer), or 0 on failure

The function initialises PyTorch MPS if needed, then returns the MTLCommandQueue* as a Python int. It returns 0 if extraction fails (e.g. non-macOS platform, PyTorch not installed, MPS not available, or unsupported PyTorch build). The underlying C++ symbol (_ZN2at3mps19getDefaultMPSStreamEv) has been stable since PyTorch 1.13.

Init ordering#

get_mps_command_queue() handles PyTorch MPS initialisation internally, so you can call it before qd.init() without any manual setup:

import quadrants as qd
from quadrants.interop import get_mps_command_queue

queue_ptr = get_mps_command_queue()  # initialises MPS if needed
qd.init(
    arch=qd.metal,
    external_metal_command_queue=queue_ptr,
    external_metal_command_queue_is_torch_queue=True,
)

What changes with a shared queue#

Scenario

Separate queues (default)

Shared queue

f.to_torch(copy=False)

qd.sync() called internally

no sync needed

f.to_torch(copy=True)

qd.sync() + torch.mps.synchronize()

no sync needed

Quadrants kernel after torch write

manual torch.mps.synchronize() required

automatic (same queue)

f.to_numpy()

qd.sync() (always needed for CPU readback)

qd.sync() (still needed — the kernel-copy path is Quadrants-internal, not routed through MPS)

Lifetime and ownership#

The caller (your application) owns the command queue. Quadrants borrows the pointer without retaining it, so the caller must keep the queue alive for the lifetime of the Quadrants runtime. In practice this means keeping PyTorch (and its MPS backend) alive for as long as qd.init() is active.

Fallback#

get_mps_command_queue() returns 0 on failure (non-macOS, missing PyTorch, unsupported build) rather than raising. You can use this to fall back to the default separate-queue path:

from quadrants.interop import get_mps_command_queue

queue_ptr = get_mps_command_queue()
qd.init(
    arch=qd.metal,
    external_metal_command_queue=queue_ptr or None,
    external_metal_command_queue_is_torch_queue=queue_ptr != 0,
)

When external_metal_command_queue is 0 (or omitted), Quadrants creates its own queue and the explicit sync path is used as before.