API

Kernel language

KernelAbstractions.@kernel — Macro

@kernel function f(args) end

Takes a function definition and generates a Kernel constructor from it. The enclosed function is allowed to contain kernel language constructs. In order to call it the kernel has first to be specialized on the backend and then invoked on the arguments.

Kernel language

@Const
@index
@localmem
@private
@uniform
@synchronize
@print

Example:

@kernel function vecadd(A, @Const(B)) I = @index(Global) @inbounds A[I] += B[I] end

A = ones(1024) B = rand(1024) event = vecadd(CPU(), 64)(A, B, ndrange=size(A)) wait(event)

KernelAbstractions.@Const — Macro

@Const(A)

@Const is an argument annotiation that asserts that the memory reference by A is both not written to as part of the kernel and that it does not alias any other memory in the kernel.

Danger

Violating those constraints will lead to arbitrary behaviour.

as an example given a kernel signature kernel(A, @Const(B)), you are not allowed to call the kernel with kernel(A, A) or kernel(A, view(A, :)).

KernelAbstractions.@index — Macro

@index

The @index macro can be used to give you the index of a workitem within a kernel function. It supports both the production of a linear index or a cartesian index. A cartesian index is a general N-dimensional index that is derived from the iteration space.

Index granularity

Global: Used to access global memory.
Group: The index of the workgroup.
Local: The within workgroup index.

Index kind

Linear: Produces an Int64 that can be used to linearly index into memory.
Cartesian: Produces a CartesianIndex{N} that can be used to index into memory.
NTuple: Produces a NTuple{N} that can be used to index into memory.

If the index kind is not provided it defaults to Linear, this is suspect to change.

Examples

@index(Global, Linear)
@index(Global, Cartesian)
@index(Local, Cartesian)
@index(Group, Linear)
@index(Local, NTuple)
@index(Global)

KernelAbstractions.@localmem — Macro

@localmem T dims

Declare storage that is local to a workgroup.

KernelAbstractions.@private — Macro

@private T dims

Declare storage that is local to each item in the workgroup. This can be safely used across @synchronize statements. On a CPU, this will allocate additional implicit dimensions to ensure correct localization.

For storage that only persists between @synchronize statements, an MArray can be used instead.

See also @uniform.

KernelAbstractions.@synchronize — Macro

@synchronize()

After a @synchronize statement all read and writes to global and local memory from each thread in the workgroup are visible in from all other threads in the workgroup.

@synchronize(cond)

After a @synchronize statement all read and writes to global and local memory from each thread in the workgroup are visible in from all other threads in the workgroup. cond is not allowed to have any visible sideffects.

Platform differences

GPU: This synchronization will only occur if the cond evaluates.
CPU: This synchronization will always occur.

KernelAbstractions.groupsize — Function

groupsize()

Query the workgroupsize on the device. This function returns a tuple corresponding to kernel configuration. In order to get the total size you can use prod(groupsize()).

Host interface

Internal

KernelAbstractions.Kernel — Type

Kernel{Device, WorkgroupSize, NDRange, Func}

Kernel closure struct that is used to represent the device kernel on the host. WorkgroupSize is the number of workitems in a workgroup.

Missing docstring.

Missing docstring for KernelAbstractions.partition. Check Documenter's build log for details.