AMDGPU API Reference

Kernel launching

AMDGPU.@roc — Macro

@roc [kwargs...] func(args...)

High-level interface for executing code on a GPU. The @roc macro should prefix a call, with func a callable function or object that should return nothing. It will be compiled to a GCN function upon first use, and to a certain extent arguments will be converted and managed automatically using rocconvert. Finally, a call to roccall is performed, scheduling a kernel launch on the specified (or default) HSA queue.

Several keyword arguments are supported that influence the behavior of @roc.

dynamic: use dynamic parallelism to launch device-side kernels
arguments that influence kernel compilation: see rocfunction and dynamic_rocfunction
arguments that influence kernel launch: see AMDGPU.HostKernel and AMDGPU.DeviceKernel

The underlying operations (argument conversion, kernel compilation, kernel call) can be performed explicitly when more control is needed, e.g. to reflect on the resource usage of a kernel to determine the launch configuration. A host-side kernel launch is done as follows:

args = ...
GC.@preserve args begin
    kernel_args = rocconvert.(args)
    kernel_tt = Tuple{Core.Typeof.(kernel_args)...}
    kernel = rocfunction(f, kernel_tt; compilation_kwargs)
    kernel(kernel_args...; launch_kwargs)
end

A device-side launch, aka. dynamic parallelism, is similar but more restricted:

args = ...
# GC.@preserve is not supported
# we're on the device already, so no need to rocconvert
kernel_tt = Tuple{Core.Typeof(args[1]), ...}    # this needs to be fully inferred!
kernel = dynamic_rocfunction(f, kernel_tt)       # no compiler kwargs supported
kernel(args...; launch_kwargs)

AMDGPU.AbstractKernel — Type

(::HostKernel)(args...; kwargs...)
(::DeviceKernel)(args...; kwargs...)

Low-level interface to call a compiled kernel, passing GPU-compatible arguments in args. For a higher-level interface, use AMDGPU.@roc.

The following keyword arguments are supported:

groupsize or threads (defaults to 1)
gridsize or blocks (defaults to 1)
config: callback function to dynamically compute the launch configuration. should accept a HostKernel and return a name tuple with any of the above as fields.
queue (defaults to the default queue)

AMDGPU.HostKernel — Type

(::HostKernel)(args...; kwargs...)
(::DeviceKernel)(args...; kwargs...)

Low-level interface to call a compiled kernel, passing GPU-compatible arguments in args. For a higher-level interface, use AMDGPU.@roc.

The following keyword arguments are supported:

groupsize or threads (defaults to 1)
gridsize or blocks (defaults to 1)
config: callback function to dynamically compute the launch configuration. should accept a HostKernel and return a name tuple with any of the above as fields.
queue (defaults to the default queue)

AMDGPU.rocfunction — Function

rocfunction(f, tt=Tuple{}; kwargs...)

Low-level interface to compile a function invocation for the currently-active GPU, returning a callable kernel object. For a higher-level interface, use @roc.

The following keyword arguments are supported:

name: override the name that the kernel will have in the generated code

The output of this function is automatically cached, i.e. you can simply call rocfunction in a hot path without degrading performance. New code will be generated automatically, when when function changes, or when different types or keyword arguments are provided.