Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

From: Samuel Sieb
Date: Fri May 26 2017 - 02:53:31 EST


On 05/24/2017 05:56 AM, Deucher, Alexander wrote:
-----Original Message-----
From: Joerg Roedel [mailto:jroedel@xxxxxxx]
Sent: Wednesday, May 24, 2017 4:45 AM
To: Deucher, Alexander
Cc: 'David Woodhouse'; 'Joerg Roedel'; Bjorn Helgaas; linux-
pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Daniel Drake; Samuel
Sieb
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

Hi Alexander,

On Tue, May 23, 2017 at 07:54:12PM +0000, Deucher, Alexander wrote:
I finally got an answer from the hw team and we validated ATS on
stoney as well so in theory this patch shouldnât actually be needed.
I think we may actually be papering over some other issue. The
following patch seems to also fix this issue (and other issues):
https://www.spinics.net/lists/stable/msg172631.html

Yeah, but it still looks to me like that the hardware got into some
weird state with the storm of ATS invalidations sent to it.

The Completion-Wait loop timeouts seen in the original bug report
indicate that the IOMMU is waiting for a response that never comes. And
this is probably the ATS flush completion response from the GPU, as
disabling ATS on the GPU makes the issue disappear.

Yeah, it's weird. My ack on the patch still stands. Just adding some additional data.


I just tested this patch without the previous ATS disabling patch (I verified that ATS was enabled). Doing a stress-test kernel build while running a 3D graphical application caused no disk corruption. That was running for several hours. If it's not the solution, it sure hides the problem really well.