Re: i915 regression in kernel 4.10

From: Konrad Rzeszutek Wilk
Date: Tue Dec 20 2016 - 11:51:43 EST


On Tue, Dec 20, 2016 at 09:42:46AM -0500, Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 19, 2016 at 03:16:44PM +0100, Juergen Gross wrote:
> > On 19/12/16 13:29, Chris Wilson wrote:
> > > On Mon, Dec 19, 2016 at 12:39:16PM +0100, Juergen Gross wrote:
> > >> With recent 4.10 kernel the graphics isn't coming up under Xen. First
> > >> failure message is:
> > >>
> > >> [ 46.656649] i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)
> > >
> > > Do we get a silent failure? i915_gem_gtt_prepare_pages() is where we
> > > call dma_map_sg() and pass the sg to swiotlb (in this case) for
> > > remapping, and we do check for an error value of 0. After that error,
> > > SWIOTLB_MAP_ERROR is propagated back and converted to 0 for
> > > dma_map_sg(). That looks valid, and we should report ENOMEM back to the
> > > caller.
> > >
> > >> Later I see splats like:
> > >>
> > >> [ 49.393583] general protection fault: 0000 [#1] SMP
> > >
> > > What was the faulting address? RAX is particularly non-pointer-like so I
> > > wonder if we walked onto an uninitialised portion of the sgtable. We may
> > > have tripped over a bug in our sg_page iterator.
> >
> > During the bisect process there have been either GP or NULL pointer
> > dereferences or other page faults. Typical addresses where:
> >
> > xen_swiotlb_unmap_sg_attrs+0x1f/0x50: access to 0000000000000018
> > xen_swiotlb_unmap_sg_attrs+0x1f/0x50: access to 0000000003020118
> >
> > >
> > > The attached patch should prevent an early ENOMEM following the swiotlb
> > > allocation failure. But I suspect that we will still be tripping up the
> > > failure in the sg walker when binding to the GPU.
> > > -Chris
> > >
> >
> > The patch is working not too bad. :-)
> >
> > Still several "swiotlb buffer is full" messages (some with sz:, most
> > without), but no faults any more (neither GP nor NULL pointer
> > dereference). Graphical login is working now.
>
>
> I think I know why. The optimization that was added assumes that
> bus addresses is the same as physical address. Hence it packs all
> of the virtual addresses in the sg, and hands it off to SWIOTLB
> which walks each one and realizes that it has to use the bounce
> buffer.
>
> I am wondering if would make sense to pull 'swiotlb_max_size' inside
> of SWIOTLB and make it an library-ish - so Xen-SWIOTLB can register
> as well and report say that it can only provide one page
> (unless it is running under baremtal).
>
> Or make the usage of 'max_segement' and 'page_to_pfn(page) != last_pfn + 1'
> in i915_gem_object_Get_pages_gtt use something similar to xen_biovec_phys_mergeable?

Chris,

I was thinking of something like this (which Juergen has already tested).