Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

From: Borislav Petkov
Date: Fri May 14 2021 - 10:38:21 EST


On Fri, May 14, 2021 at 01:06:33PM +0000, Joshi, Mukul wrote:
> We have RAS functionality in other ASICs that is not dependent on
> CONFIG_X86_MCE_AMD. So, I don't think we would want to do that just
> for one ASIC.

Lemme try again: you said that those errors do get reported through a
deferred interrupt. Which is likely amd_deferred_error_interrupt().

If it is that interrupt and you don't have CONFIG_X86_MCE_AMD enabled,
then you won't get any errors reported and your RAS functionality will
simply sit there inactive.

So if that above is true - something to which I'm still not getting
an answer but maybe one fine day... - so if that above is true, your
RAS functionality *needs* CONFIG_X86_MCE_AMD to be enabled in order to
*actually* function.

So you *must* make your RAS functionality depend on CONFIG_X86_MCE_AMD
- otherwise no deferred interrupts and no errors reported. It is that
simple.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette