RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

From: Deucher, Alexander
Date: Tue May 23 2017 - 15:54:22 EST


> -----Original Message-----
> From: David Woodhouse [mailto:dwmw2@xxxxxxxxxxxxx]
> Sent: Thursday, May 04, 2017 6:22 AM
> To: Deucher, Alexander; 'Joerg Roedel'; Bjorn Helgaas
> Cc: linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Daniel Drake;
> Samuel Sieb; Joerg Roedel
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
>
> On Fri, 2017-04-07 at 16:46 +0000, Deucher, Alexander wrote:
> > >
> > > -----Original Message-----
> > > From: Joerg Roedel [mailto:joro@xxxxxxxxxx]
> > > Sent: Friday, April 07, 2017 10:32 AM
> > > To: Bjorn Helgaas
> > > Cc: linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Daniel
> Drake;
> > > Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> > > Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> > >
> > > From: Joerg Roedel <jroedel@xxxxxxx>
> > >
> > > ATS is broken on this hardware and causes IOMMU stalls and
> > > system failure. Disable ATS on these devices to make them
> > > usable again with IOMMU enabled.
> > >
> > > Note that the commit in the Fixes-tag is not buggy, it
> > > just uncovers the problem in the hardware by increasing
> > > the ATS-flush rate.
> > >
> > > Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> > > Signed-off-by: Joerg Roedel <jroedel@xxxxxxx>
> > Acked-by: Alex Deucher <alexander.deucher@xxxxxxx>
>
> Alex, are you able to confirm that it is *only* the device with PCI ID
> 0x98e4 which has this problem, or (more likely) come up with an
> exhaustive list? Thanks.
>
> We'll want the same blacklist in Xen too, won't we?

I finally got an answer from the hw team and we validated ATS on stoney as well so in theory this patch shouldnât actually be needed. I think we may actually be papering over some other issue. The following patch seems to also fix this issue (and other issues):
https://www.spinics.net/lists/stable/msg172631.html

Alex

>
> > >
> > > ---
> > > Âdrivers/pci/quirks.c | 19 +++++++++++++++++++
> > > Â1 file changed, 19 insertions(+)
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6736836..7cbe316 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev
> *pdev)
> > > ÂDECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> > > quirk_no_aersid);
> > > ÂDECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> > > quirk_no_aersid);
> > > ÂDECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> > > quirk_no_aersid);
> > > +
> > > +#ifdef CONFIG_PCI_ATS
> > > +/*
> > > + * Some devices have a broken ATS implementation causing IOMMU
> stalls.
> > > + * Don't use ATS for those devices.
> > > + */
> > > +static void quirk_disable_ats(struct pci_dev *pdev)
> > > +{
> > > + /*
> > > + Â* Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> > > + Â* early.
> > > + Â*/
> > > + dev_info(&pdev->dev, "QUIRK: Disabling ATS");
> > > + pdev->ats_cap = 0;
> > > +}
> > > +
> > > +/* AMD Stoney platform GPU */
> > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_disable_ats);
> > > +#endif /* CONFIG_PCI_ATS */
> > > --
> > > 1.9.1