AMDGPU API Reference

Kernel launching

AMDGPU.@rocMacro
@roc [kwargs...] func(args...)

High-level interface for executing code on a GPU. The @roc macro should prefix a call, with func a callable function or object that should return nothing. It will be compiled to a GCN function upon first use, and to a certain extent arguments will be converted and managed automatically using rocconvert. Finally, a call to roccall is performed, scheduling a kernel launch on the specified (or default) HSA queue.

Several keyword arguments are supported that influence the behavior of @roc.

The underlying operations (argument conversion, kernel compilation, kernel call) can be performed explicitly when more control is needed, e.g. to reflect on the resource usage of a kernel to determine the launch configuration. A host-side kernel launch is done as follows:

args = ...
GC.@preserve args begin
    kernel_args = rocconvert.(args)
    kernel_tt = Tuple{Core.Typeof.(kernel_args)...}
    kernel = rocfunction(f, kernel_tt; compilation_kwargs)
    kernel(kernel_args...; launch_kwargs)
end

A device-side launch, aka. dynamic parallelism, is similar but more restricted:

args = ...
# GC.@preserve is not supported
# we're on the device already, so no need to rocconvert
kernel_tt = Tuple{Core.Typeof(args[1]), ...}    # this needs to be fully inferred!
kernel = dynamic_rocfunction(f, kernel_tt)       # no compiler kwargs supported
kernel(args...; launch_kwargs)
AMDGPU.AbstractKernelType
(::HostKernel)(args...; kwargs...)
(::DeviceKernel)(args...; kwargs...)

Low-level interface to call a compiled kernel, passing GPU-compatible arguments in args. For a higher-level interface, use AMDGPU.@roc.

The following keyword arguments are supported:

  • groupsize or threads (defaults to 1)
  • gridsize or blocks (defaults to 1)
  • config: callback function to dynamically compute the launch configuration. should accept a HostKernel and return a name tuple with any of the above as fields.
  • queue (defaults to the default queue)
AMDGPU.HostKernelType
(::HostKernel)(args...; kwargs...)
(::DeviceKernel)(args...; kwargs...)

Low-level interface to call a compiled kernel, passing GPU-compatible arguments in args. For a higher-level interface, use AMDGPU.@roc.

The following keyword arguments are supported:

  • groupsize or threads (defaults to 1)
  • gridsize or blocks (defaults to 1)
  • config: callback function to dynamically compute the launch configuration. should accept a HostKernel and return a name tuple with any of the above as fields.
  • queue (defaults to the default queue)
AMDGPU.rocfunctionFunction
rocfunction(f, tt=Tuple{}; kwargs...)

Low-level interface to compile a function invocation for the currently-active GPU, returning a callable kernel object. For a higher-level interface, use @roc.

The following keyword arguments are supported:

  • name: override the name that the kernel will have in the generated code

The output of this function is automatically cached, i.e. you can simply call rocfunction in a hot path without degrading performance. New code will be generated automatically, when when function changes, or when different types or keyword arguments are provided.

Device code API

Thread indexing

HSA nomenclature

AMDGPU.gridDimFunction
gridDim()::ROCDim3

Returns the size of the grid in workitems. This behaviour is different from CUDA where gridDim gives the size of the grid in blocks.

AMDGPU.gridDimWGFunction
gridDimWG()::ROCDim3

Returns the size of the grid in workgroups. This is equivalent to CUDA's gridDim.

CUDA nomenclature

Use these functions for compatibility with CUDAnative.jl.

Synchronization

AMDGPU.sync_workgroupFunction
sync_workgroup()

Waits until all wavefronts in a workgroup have reached this call.

Pointers

Global Variables

Missing docstring.

Missing docstring for AMDGPU.get_global_pointer. Check Documenter's build log for details.