Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline

From: Dave Airlie
Date: Tue Aug 18 2020 - 21:13:06 EST


On Wed, 19 Aug 2020 at 10:38, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Ping on this?
>
> The code disassembles to
>
> 24: 8b 85 d0 fd ff ff mov -0x230(%ebp),%eax
> 2a:* c7 03 01 00 40 10 movl $0x10400001,(%ebx) <-- trapping instruction
> 30: 89 43 04 mov %eax,0x4(%ebx)
> 33: 8b 85 b4 fd ff ff mov -0x24c(%ebp),%eax
> 39: 89 43 08 mov %eax,0x8(%ebx)
> 3c: e9 jmp ...
>
> which looks like is one of the cases in __reloc_entry_gpu(). I *think*
> it's this one:
>
> } else if (gen >= 3 &&
> !(IS_I915G(eb->i915) || IS_I915GM(eb->i915))) {
> *batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
> *batch++ = addr;
> *batch++ = target_addr;
>
> where that "batch" pointer is 0xf8601000, so it looks like it just
> overflowed into the next page that isn't there.
>
> The cleaned-up call trace is
>
> drm_ioctl+0x1f4/0x38b ->
> drm_ioctl_kernel+0x87/0xd0 ->
> i915_gem_execbuffer2_ioctl+0xdd/0x360 ->
> i915_gem_do_execbuffer+0xaab/0x2780 ->
> eb_relocate_vma
>
> but there's a lot of inling going on, so..
>
> The obvious suspect is commit 9e0f9464e2ab ("drm/i915/gem: Async GPU
> relocations only") but that's going purely by "that seems to be the
> main relocation change this mmrge window".

I think there's been some discussion about reverting that change for
other reasons, but it's quite likely the culprit.

Maybe we can push for a revert sooner, (cc'ing more of i915 team).

Dave.