Re: 5.9-rc1: graphics regression moved from -next to mainline

From: Linus Torvalds
Date: Tue Aug 18 2020 - 20:38:34 EST


Ping on this?

The code disassembles to

24: 8b 85 d0 fd ff ff mov -0x230(%ebp),%eax
2a:* c7 03 01 00 40 10 movl $0x10400001,(%ebx) <-- trapping instruction
30: 89 43 04 mov %eax,0x4(%ebx)
33: 8b 85 b4 fd ff ff mov -0x24c(%ebp),%eax
39: 89 43 08 mov %eax,0x8(%ebx)
3c: e9 jmp ...

which looks like is one of the cases in __reloc_entry_gpu(). I *think*
it's this one:

} else if (gen >= 3 &&
!(IS_I915G(eb->i915) || IS_I915GM(eb->i915))) {
*batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
*batch++ = addr;
*batch++ = target_addr;

where that "batch" pointer is 0xf8601000, so it looks like it just
overflowed into the next page that isn't there.

The cleaned-up call trace is

drm_ioctl+0x1f4/0x38b ->
drm_ioctl_kernel+0x87/0xd0 ->
i915_gem_execbuffer2_ioctl+0xdd/0x360 ->
i915_gem_do_execbuffer+0xaab/0x2780 ->
eb_relocate_vma

but there's a lot of inling going on, so..

The obvious suspect is commit 9e0f9464e2ab ("drm/i915/gem: Async GPU
relocations only") but that's going purely by "that seems to be the
main relocation change this mmrge window".

Linus

On Mon, Aug 17, 2020 at 9:11 AM Pavel Machek <pavel@xxxxxx> wrote:
>
> Hi!
>
> After about half an hour of uptime, screen starts blinking on thinkpad
> x60 and machine becomes unusable.
>
> I already reported this in -next, and now it is in mainline. It is
> 32-bit x86 system.
>
>
> Pavel
>
>
> Aug 17 17:36:04 amd ovpn-castor[2828]: UDPv4 link local (bound):
> [undef]
> Aug 17 17:36:04 amd ovpn-castor[2828]: UDPv4 link remote:
> [AF_INET]87.138.219.28:1194
> Aug 17 17:36:23 amd kernel: BUG: unable to handle page fault for
> address: f8601000
> Aug 17 17:36:23 amd kernel: #PF: supervisor write access in kernel
> mode
> Aug 17 17:36:23 amd kernel: #PF: error_code(0x0002) - not-present page
> Aug 17 17:36:23 amd kernel: *pdpt = 00000000318f2001 *pde =
> 0000000000000000
> Aug 17 17:36:23 amd kernel: Oops: 0002 [#1] PREEMPT SMP PTI
> Aug 17 17:36:23 amd kernel: CPU: 1 PID: 3004 Comm: Xorg Not tainted
> 5.9.0-rc1+ #86
> Aug 17 17:36:23 amd kernel: Hardware name: LENOVO 17097HU/17097HU,
> BIOS 7BETD8WW (2.19 ) 03/31
> /2011
> Aug 17 17:36:23 amd kernel: EIP: eb_relocate_vma+0xcf6/0xf20
> Aug 17 17:36:23 amd kernel: Code: e9 ff f7 ff ff c7 85 c0 fd ff ff ed
> ff ff ff c7 85 c4 fd ff
> ff ff ff ff ff 8b 85 c0 fd ff ff e9 a5 f8 ff ff 8b 85 d0 fd ff ff <c7>
> 03 01 00 40 10 89 43 04
> 8b 85 b4 fd ff ff 89 43 08 e9 9f f7 ff
> Aug 17 17:36:23 amd kernel: EAX: 003c306c EBX: f8601000 ECX: 00847000
> EDX: 00000000
> Aug 17 17:36:23 amd kernel: ESI: 00847000 EDI: 00000000 EBP: f1947c68
> ESP: f19479fc
> Aug 17 17:36:23 amd kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS:
> 0068 EFLAGS: 00210246
> Aug 17 17:36:23 amd kernel: CR0: 80050033 CR2: f8601000 CR3: 31a1e000
> CR4: 000006b0
> Aug 17 17:36:23 amd kernel: Call Trace:
> Aug 17 17:36:23 amd kernel: ? i915_vma_pin+0xc5/0x8c0
> Aug 17 17:36:23 amd kernel: ? __mutex_unlock_slowpath+0x2b/0x280
> Aug 17 17:36:23 amd kernel: ? __active_retire+0x7e/0xd0
> Aug 17 17:36:23 amd kernel: ? mutex_unlock+0xb/0x10
> Aug 17 17:36:23 amd kernel: ? i915_vma_pin+0xc5/0x8c0
> Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530
> Aug 17 17:36:23 amd kernel: ? eb_lookup_vmas+0x1f5/0x9e0
> Aug 17 17:36:23 amd kernel: i915_gem_do_execbuffer+0xaab/0x2780
> Aug 17 17:36:23 amd kernel: ? _raw_spin_unlock_irqrestore+0x27/0x40
> Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530
> Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530
> Aug 17 17:36:23 amd kernel: ? kvmalloc_node+0x69/0x70
> Aug 17 17:36:23 amd kernel: i915_gem_execbuffer2_ioctl+0xdd/0x360
> Aug 17 17:36:23 amd kernel: ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0
> Aug 17 17:36:23 amd kernel: drm_ioctl_kernel+0x87/0xd0
> Aug 17 17:36:23 amd kernel: drm_ioctl+0x1f4/0x38b
> Aug 17 17:36:23 amd kernel: ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0
> Aug 17 17:36:23 amd kernel: ? posix_get_monotonic_timespec+0x1c/0x90
> Aug 17 17:36:23 amd kernel: ? ktime_get_ts64+0x7a/0x1e0
> Aug 17 17:36:23 amd kernel: ? drm_ioctl_kernel+0xd0/0xd0
> Aug 17 17:36:23 amd kernel: __ia32_sys_ioctl+0x1ad/0x799
> Aug 17 17:36:23 amd kernel: ? debug_smp_processor_id+0x12/0x20
> Aug 17 17:36:23 amd kernel: ? exit_to_user_mode_prepare+0x4f/0x100
> Aug 17 17:36:23 amd kernel: do_int80_syscall_32+0x2c/0x40
> Aug 17 17:36:23 amd kernel: entry_INT80_32+0x111/0x111
> Aug 17 17:36:23 amd kernel: EIP: 0xb7fbc092
> Aug 17 17:36:23 amd kernel: Code: 00 00 00 e9 90 ff ff ff ff a3 24 00
> 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00
> 00 00 00 00 00 cd 80 <c3> 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b
> 1c 24 c3 8d b4 26 00
> Aug 17 17:36:23 amd kernel: EAX: ffffffda EBX: 0000000a ECX: c0406469
> EDX: bff0ae3c
> Aug 17 17:36:23 amd kernel: ESI: b73aa000 EDI: c0406469 EBP: 0000000a
> ESP: bff0adb4
> Aug 17 17:36:23 amd kernel: DS: 007b ES: 007b FS: 0000 GS: 0033 SS:
> 007b EFLAGS: 00200296
> Aug 17 17:36:23 amd kernel: ? asm_exc_nmi+0xcc/0x2bc
> Aug 17 17:36:23 amd kernel: Modules linked in:
> Aug 17 17:36:23 amd kernel: CR2: 00000000f8601000
> Aug 17 17:36:23 amd kernel: ---[ end trace 2ca9775068bbac06 ]---
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html