Re: MT76x2U crashes XHCI driver on AMD Ryzen system

From: Stanislaw Gruszka
Date: Thu Feb 28 2019 - 07:19:56 EST


On Thu, Feb 28, 2019 at 11:42:24AM +0100, Stanislaw Gruszka wrote:
> On Thu, Feb 28, 2019 at 10:04:12AM +0100, Stanislaw Gruszka wrote:
> > On Tue, Feb 26, 2019 at 12:24:08PM +0100, Stanislaw Gruszka wrote:
> > > On Tue, Feb 26, 2019 at 11:44:13AM +0100, Joerg Roedel wrote:
> > > > On Tue, Feb 26, 2019 at 11:34:51AM +0100, Stanislaw Gruszka wrote:
> > > > > On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote:
> > > > > If sg->offset > PAGE_SIZE is fine then most likely we have problem with
> > > > > alignment.
> > > >
> > > > The map_sg implementation in the AMD IOMMU driver uses sg_phys() which
> > > > handles the sg->page + sg->offset calculation fine.
> > > >
> > > > > Note hat issue is with dma_map_sg(), switching to dma_map_single()
> > > > > by using urb->transfer_buffer instead of urb->sg make things work
> > > > > on AMD IOMMU.
> > > >
> > > > On the other hand this points to a bug in the driver, I'll look further
> > > > if I can spot something there.
> > >
> > > I think so too. And I have done some changes that avoid strange allocation
> > > scheme and use usb synchronous messages instead of allocating buffers
> > > with unaligned sizes. However things work ok on Intel IOMMU and
> > > there is no documentation what are dma_map_sg() requirement versus
> > > dma_map_single() which works. I think there are some unwritten
> > > requirements and things can work on some platforms and fails on others
> > > (different IOMMUs, no-IOMMU on some ARCHes)
> >
> > For the record: we have another bug report with this issue:
> > https://bugzilla.kernel.org/show_bug.cgi?id=202673
> >
> > I provided there patch that change alignment for page_frag_alloc() and
> > it did not fixed the problem. So this is not alignment issue.
> > Now I think it could be page->refcount issue ...
>
> I looked at the map_sg() in amd_iommu.c code and one line looks suspicious
> to me, seems we can use not correctly initialized s->dma_address (should be 0,
> but I think can be non-zero if SG was reused). The code also seems do
> not do correct thing if there is more than one SG with multiple pages
> on individual segments. Something like in below patch seems to be more
> appropriate to me (not tested nor compiled).

Nevermind, the patch is wrong, s->dma_address is initalized in sg_num_pages().

Stanislaw