# Building the CUDA graph conditional fatbin

The `graph_do_while` feature uses a tiny CUDA kernel that calls
`cudaGraphSetConditional` (a device-side function from NVIDIA's
`libcudadevrt.a`) to control CUDA graph conditional while nodes. These
conditional nodes require SM 9.0+ (Hopper or later); on older GPUs,
`graph_do_while` falls back to a host-side loop automatically.

There are three distinct phases:

1. **Fatbin generation** (rare, manual) — A developer runs
   `scripts/build_condition_kernel_fatbin.py`, which compiles the kernel and
   device-links it with `libcudadevrt.a` to resolve `cudaGraphSetConditional`.
   The output is a self-contained fatbin, committed to git as a C header.
   Requires `nvcc` and the CUDA toolkit.
2. **Quadrants build** (CI / developers) — The C header is `#include`d as a
   plain byte array. No CUDA toolkit needed.
3. **Runtime** (end users) — The fatbin is loaded via `cuModuleLoadData`.
   No CUDA toolkit needed.

This page documents phase 1: regenerating the pre-built fatbin.

## When to regenerate

You only need to regenerate the fatbin if:

- The condition kernel source (`quadrants/runtime/cuda/graph_do_while_cond.cu`)
  changes.
- You need to add support for a new SM architecture.

## Prerequisites

- CUDA toolkit with `nvcc` (13.0 or later, required for SM 110 support;
  earlier toolkits will fail with `Unsupported gpu architecture 'compute_110'`).
- The `nvcc` binary must be on your `PATH`, or set `CUDA_HOME`.

## Regenerating

Run the script from the repo root:

```bash
python scripts/build_condition_kernel_fatbin.py
```

This will:

1. Compile `quadrants/runtime/cuda/graph_do_while_cond.cu` with relocatable device
   code for each target SM architecture.
2. Device-link the result with `libcudadevrt.a` to resolve the
   `cudaGraphSetConditional` extern.
3. Write the fatbin as a C byte array to
   `quadrants/runtime/cuda/graph_do_while_cond_fatbin.h`.

After regenerating, commit the updated header. Quadrants must be rebuilt to
pick up the new fatbin.

## Adding a new SM architecture

Edit the `SM_VERSIONS` list in `scripts/build_condition_kernel_fatbin.py` to
add the new SM version number (e.g., `130`), then regenerate.

## Files

| File | Purpose |
|------|---------|
| `quadrants/runtime/cuda/graph_do_while_cond.cu` | CUDA C source for the condition kernel |
| `scripts/build_condition_kernel_fatbin.py` | Regeneration script |
| `quadrants/runtime/cuda/graph_do_while_cond_fatbin.h` | Generated C header (checked into git) |

## How it's used at runtime

`GraphManager::ensure_condition_kernel_loaded()` in
`quadrants/runtime/cuda/graph_manager.cpp` loads the fatbin via
`cuModuleLoadData`. If the fatbin does not contain SASS for the current GPU's
SM architecture, loading fails with a clear error pointing to this script.