Re: [PATCH] drm/amd: Document device reset methods

From: Alex Deucher
Date: Fri Nov 10 2023 - 12:42:32 EST


On Fri, Nov 10, 2023 at 10:56 AM André Almeida <andrealmeid@xxxxxxxxxx> wrote:
>
> Document what each amdgpu driver reset method does.
>
> Signed-off-by: André Almeida <andrealmeid@xxxxxxxxxx>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index a79d53bdbe13..500f86c79eb7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -504,6 +504,26 @@ struct amdgpu_allowed_register_entry {
> bool grbm_indexed;
> };
>
> +/**
> + * enum amd_reset_method - Methods for resetting AMD GPU devices
> + *
> + * @AMD_RESET_METHOD_NONE: The device will not be reset.
> + * @AMD_RESET_LEGACY: Method reserved for SI/CIK asics.

This also applies to VI asics.

> + * @AMD_RESET_MODE0: High level PCIe reset.

Resets the entire ASIC. Here for completeness, but not actually
available to the driver.

> + * @AMD_RESET_MODE1: Resets each IP block (SDMA, GFX, VCN, etc.) individually.
> + * Suitable only for some discrete GPUs.

Resets all IPs on the asic. Not available on all asics.

> + * @AMD_RESET_MODE2: Resets only the GFX block. Useful for APUs, giving that
> + * the rest of IP blocks and SMU is shared with the CPU.

Resets a lesser level of IPs compared to MODE1. Which IPs are reset
depends on the asic. Notably doesn't reset IPs shared with the CPU on
APUs or the memory controllers (so VRAM is not lost). Not available
on all asics.

> + * @AMD_RESET_BACO: BACO (Bus Alive, Chip Off) method powers off and on the card
> + * but without powering off the PCI bus. Suitable only for
> + * discrete GPUs.
> + * @AMD_RESET_PCI: Does a full bus reset, including powering on and off the
> + * card.

This calls into the core Linux PCI reset code and does a secondary bus
reset or FLR, depending on what the underlying hardware supports.

> + *
> + * Methods available for AMD GPU driver for resetting the device. Not all
> + * methods are suitable for every device. User can overwrite the method using
> + * module parameter `reset_method`.
> + */
> enum amd_reset_method {
> AMD_RESET_METHOD_NONE = -1,
> AMD_RESET_METHOD_LEGACY = 0,
> --
> 2.42.1
>