RE: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS

From: Deucher, Alexander
Date: Wed Mar 29 2017 - 12:21:17 EST


> -----Original Message-----
> From: 'Joerg Roedel' [mailto:jroedel@xxxxxxx]
> Sent: Tuesday, March 28, 2017 6:26 PM
> To: Deucher, Alexander
> Cc: 'Joerg Roedel'; Bjorn Helgaas; linux-pci@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Daniel Drake; Nath, Arindam
> Subject: Re: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS
>
> On Tue, Mar 28, 2017 at 09:13:23PM +0000, Deucher, Alexander wrote:
> > If I understand Arindam's patch correctly, it only flushes TLB entries
> > for domains in the flush queue whereas the previous behavior was to
> > flush all domains. If there was no TLB flush in the queue for that
> > domain, could flushing it cause a problem?
>
> No, that can't cause a problem. An io/tlb flush for the device is just a
> message that the device should invalidate its own tlb. The device can't
> know and doesn't need to know whether the page-tables it used to fill
> the tlb really changed.
>
> As it looks, the problem we are seeing here is that we are sending a
> large amount of these requests to the GPU device, and wait for its
> completion every time. This shouldn't be a problem for ATS devices, but
> the GPU here seems to fail at some point and doesn't answer to the
> invalidation request anymore, causing the completion-wait loop timeouts.
>
> Arindam's patch makes the high flush-frequency less likely, but it can
> still happen, depending on how the GPU is used. So its the best to
> keep ATS disabled on the device as it doesn't work correctly and we risk
> running in the same problem again when we leave it enabled and just make
> the trigger less likely.

Thanks for clarifying. The patch is:
Acked-by: Alex Deucher <alexander.deucher@xxxxxxx>