Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5

From: Bagas Sanjaya
Date: Tue Sep 19 2023 - 10:03:13 EST


On Tue, Sep 19, 2023 at 03:23:28PM +0200, Oleksandr Natalenko wrote:
> /cc Bagas as well (see below).
>
> On úterý 19. září 2023 10:26:42 CEST Oleksandr Natalenko wrote:
> > /cc Matthew Wilcox and Andrew Morton because of folios (please see below).
> >
> > On sobota 2. září 2023 18:14:12 CEST Oleksandr Natalenko wrote:
> > > Hello.
> > >
> > > Since v6.5 kernel the following HW:
> > >
> > > * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
> > > * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> > >
> > > is affected by the following crash once KDE on either X11 or Wayland is started:
> > >
> > > i915 0000:00:02.0: enabling device (0006 -> 0007)
> > > i915 0000:00:02.0: vgaarb: deactivate vga console
> > > i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=mem
> > > i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/skl_dmc_ver1_27.bin (v1.27)
> > > [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1
> > > fbcon: i915drmfb (fb0) is primary device
> > > i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
> > > …
> > > memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=674 'kwin_wayland'
> > > BUG: unable to handle page fault for address: ffffb422c2800000
> > > #PF: supervisor write access in kernel mode
> > > #PF: error_code(0x0002) - not-present page
> > > PGD 100000067 P4D 100000067 PUD 1001df067 PMD 10d1cf067 PTE 0
> > > Oops: 0002 [#1] PREEMPT SMP PTI
> > > CPU: 1 PID: 674 Comm: kwin_wayland Not tainted 6.5.0-pf1 #1 a6c58ff41a7b8bb16a19f5af9e0e9bce20f9f38d
> > > Hardware name: LENOVO 20FAS2BM0F/20FAS2BM0F, BIOS N1CET90W (1.58 ) 11/15/2022
> > > RIP: 0010:gen8_ggtt_insert_entries+0xc2/0x140 [i915]
> > > …
> > > Call Trace:
> > > <TASK>
> > > intel_ggtt_bind_vma+0x3e/0x60 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > > i915_vma_bind+0x216/0x4b0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > > i915_vma_pin_ww+0x405/0xa80 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > > __i915_ggtt_pin+0x5a/0x130 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > > i915_ggtt_pin+0x78/0x1f0 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > > __intel_context_do_pin_ww+0x312/0x700 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > > i915_gem_do_execbuffer+0xfc6/0x2720 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > > i915_gem_execbuffer2_ioctl+0x111/0x260 [i915 a83fdc6539431252dba13053979a8b680af86836]
> > > drm_ioctl_kernel+0xca/0x170
> > > drm_ioctl+0x30f/0x580
> > > __x64_sys_ioctl+0x94/0xd0
> > > do_syscall_64+0x5d/0x90
> > > entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > > …
> > > note: kwin_wayland[674] exited with irqs disabled
> > >
> > > RIP seems to translate into this:
> > >
> > > $ scripts/faddr2line drivers/gpu/drm/i915/gt/intel_ggtt.o gen8_ggtt_insert_entries+0xc2
> > > gen8_ggtt_insert_entries+0xc2/0x150:
> > > writeq at /home/pf/work/devel/own/pf-kernel/linux/./arch/x86/include/asm/io.h:99
> > > (inlined by) gen8_set_pte at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:257
> > > (inlined by) gen8_ggtt_insert_entries at /home/pf/work/devel/own/pf-kernel/linux/drivers/gpu/drm/i915/gt/intel_ggtt.c:300
> > >
> > > Probably, recent PTE-related changes are relevant:
> > >
> > > $ git log --oneline --no-merges v6.4..v6.5 -- drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > 3532e75dfadcf drm/i915/uc: perma-pin firmwares
> > > 4722e2ebe6f21 drm/i915/gt: Fix second parameter type of pre-gen8 pte_encode callbacks
> > > 9275277d53248 drm/i915: use pat_index instead of cache_level
> > > 5e352e32aec23 drm/i915: preparation for using PAT index
> > > 341ad0e8e2542 drm/i915/mtl: Add PTE encode function
> > >
> > > Also note Lenovo T14s laptop with TigerLake-LP GT2 [Iris Xe Graphics] (rev 01) is not affected by this issue.
> > >
> > > Full dmesg with DRM debug enabled is available in the bugreport I've reported earlier [1]. I'm sending this email to make the issue more visible.
> > >
> > > Please help.
> > >
> > > Thanks.
> > >
> > > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
> >
> > Matthew,
> >
> > Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 1e0877d58b1e, and reverting those fixed the i915 crash for me. The e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") is the culprit here.
> >
> > Could you please check this?
> >
> > Our conversation with Andrzej is available at drm-intel GitLab [1].
> >
> > Thanks.
> >
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
>
> Bagas,
>
> would you mind adding this to the regression tracker please?
>

Will add shortly, thanks!

--
An old man doll... just what I always wanted! - Clara

Attachment: signature.asc
Description: PGP signature