Tensor types#

There are two core tensor types:

  • ndarray (qd.ndarray)

  • field (qd.field)

In addition, when used from a qd kernel, fields can be:

  • referenced as a global variable

  • passed into the kernel as a parameter

For the remainder of this doc, we will compare three approaches, which we will refer to as:

  • “ndarrays”: always passed into kernels as parameters

  • “global fields”: referenced from kernel as global variable

  • “field args”: passed into the parameter as a kernel

Example of each tensor approach#

Let’s first give an example of using each:

NDArray#

import quadrants as qd

qd.init(arch=qd.gpu)

a = qd.ndarray(qd.i32, shape=(10,))

@qd.kernel
def f1(p1: qd.types.NDArray[qd.i32, 1]) -> None:
    p1[0] += 1

Note that the typing for NDArray is qd.types.NDArray[data_type, number_dimensions]

Global field#

import quadrants as qd

qd.init(arch=qd.gpu)

a = qd.field(qd.i32, shape=(10,))

@qd.kernel
def f1() -> None:
    a[0] += 1

You can see that we access the global variable referencing the field directly from the kernel. No need to provide the field as a parameter.

Field args#

import quadrants as qd

qd.init(arch=qd.gpu)

a = qd.field(qd.i32, shape=(10,))

@qd.kernel
def f1(p1: qd.Template) -> None:
    p1[0] += 1

In this case, we provide the field to the kernel via a parameter, with typing type of qd.Template.

Comparison of tensor approaches#

Tensor approach

Launch latency

Runtime speed

Resizable without recompile? [*1]

Encapsulation?[*2]

ndarray

Slowest

Slower

yes

Yes

global field

Fastest

Fast

no

No

field arg

Medium

Fast

no

Yes

  • [*1] We’ll discuss this in ‘Under the covers’ below

  • [*2] Will be discussed in ‘Encapsulation’ below

Let’s define each of these column headings.

Under the covers summary#

When running a kernel, three things need to happen:

  • the kernel needs to be compiled

  • the parameters need to be sent to the GPU (contributes to kernel launch latency)

  • the kernel launch needs to be sent to the GPU (contributes to kernel launch latency)

Looking at kernel launch latency:

  • field args and ndarrays both are passed in to the GPU as parameters, and hence increase launch latency

  • ndarrays have more parameter processing than field args, and have the biggest launch latency

Each tensor type is bound to the compiled kernel in some way:

  • global fields are permanently bound to the kernel

    • to use the kernel with a different tensor, you’d need to copy and paste the kernel, with a new name

  • field args are permanently bound to the compiled kernel

    • however, as the typing qd.Template alludes to, you can call the kernel with different fields, and the kernel will be automatically recompiled to bind with the new field

  • ndarrays are only bound by:

    • the data type (qd.i32 vs qd.f32 for example)

    • the number of dimensions

  • Each call with an ndarray of a different data type or number of dimensions, that hasn’t already been compiled for, will trigger a recompile.

  • However, no recompilation is needed for passing in a different ndarray, that matches data type and number of dimensions.

Encapsulation#

Using global variables provides fairly poor encapsulation and re-use.

Both ndarrays and field args provide better encapsulation, and kernel re-use.

launch latency vs runtime speed#

For kernels that run for sufficiently long, the launch latency will be entirely hidden by the kernel runtime. Launch latency only affects performance for very short kernels.

Recommendations#

  • for maximum flexibility to resize tensors, use ndarrays

  • for maximum runtime speed, with good encapsulation, use field args

  • if the kernels are very short, for maximum speed you might need to use global fields, but this comes at the expense of poor encapsulation