Tensor types#

There are three core tensor types:

  • ndarray (ti.ndarray)

  • global field (ti.field, referenced as a global variable, from a kernel)

  • field arg (ti.field, passed into a kernel as a parameter)

Example of each tensor type#

Let’s first give an example of using each:

NDArray#

import quadrants as ti

ti.init(arch=ti.gpu)

a = ti.ndarray(ti.i32, shape=(10,))

@ti.kernel
def f1(p1: ti.types.NDArray[ti.i32, 1]) -> None:
    p1[0] += 1

Note that the typing for NDArray is ti.types.NDArray[data_type, number_dimensions]

Global field#

import quadrants as ti

ti.init(arch=ti.gpu)

a = ti.field(ti.i32, shape=(10,))

@ti.kernel
def f1() -> None:
    a[0] += 1

You can see that we access the global variable referencing the field directly from the kernel. No need to provide the field as a parameter.

Field args#

import quadrants as ti

ti.init(arch=ti.gpu)

a = ti.field(ti.i32, shape=(10,))

@ti.kernel
def f1(p1: ti.Template) -> None:
    p1[0] += 1

In this case, we provide the field to the kernel via a parameter, with typing type of ti.Template.

Comparison of tensor types#

Tensor type

Launch latency

Runtime speed

Resizable without recompile? [*1]

Encapsulation?[*2]

ndarray

Slowest

Slower

yes

Yes

global field

Fastest

Fast

no

No

field arg.

Medium

Fast

no

Yes

  • [*1] We’ll discuss this in ‘Under the covers’ below

  • [*2] Will be discussed in ‘Encapsulation’ below

Let’s define each of these column headings.

Under the covers summary#

When running a kernel, two things need to happen:

  • the kernel needs to be compiled

  • the parameters need to be sent to the GPU

  • the kernel launch need to be sent to the GPU

Compilation speed is not affected by the tensor type. However:

  • field args and ndarrays both are passed in to the GPU as parameters, and hence increase launch latency

  • ndarrays have more parameter processing than field args, and have the biggest launch latency

Each tensor type is bound to the compiled kernel in some way:

  • global fields are permanently bound to the kernel

    • to use the kernel with a different tensor, you’d need to copy and paste the kernel, with a new name

  • field args are permanently bound to the compiled kernel

    • however, as the typing ti.Template alludes to, you can call the kernel with different fields, and the kernel will be automatically recompiled to bind with the new field

  • ndarrays are only bound by:

    • the data type (ti.i32 vs ti.f32 for example)

    • the number of dimensions

    • you cannot pass in an ndarray with different data type or number of dimensions into the kernel, however

    • … no recompilatino is needed for:

      • resizing the ndarray, or

      • passing in a different ndarray, that matches data type and number of dimensions

Encapsulation#

Using global variables provides fairly poor encapsulation and re-use.

Both ndarrays and field args provide better encapsulation, and kernel re-use.

launch latency vs runtime speed#

For kernels that run for sufficiently long, the launch latency will be entirely hidden by the kernel runtime. Launch latency only affects performacne for very short kernels.

Recommendations#

  • for maximum flexibility to resize tensors, use ndarrays

  • for maximum runtime speed, with good encapsulation, use field args

  • if the kernels are very short, for maximum speed you might need to use global fields, but this comes at the expense of good encapsulation