Re: i915: crash with 5.19-rc2
From: Hugh Dickins
Date: Wed Aug 10 2022 - 11:57:15 EST
On Wed, 10 Aug 2022, Zdenek Kabelac wrote:
> Dne 22. 06. 22 v 13:18 Zdenek Kabelac napsal(a):
> > Hello
> >
> > While somewhat oldish hw (T61, 4G, C2D) - I've now witnessed new crash with
> > Xorg:
> >
> > (happened while reopening iconified Firefox window - running 'standard'
> > rawhide -nodebug kernel 5.19.0-0.rc2.21.fc37.x86_64)
> >
>
> Hello
>
>
> Ok, I think I now know what is behind this BUG/crash of intel graphics -
> interestingly it took me a few weeks to realize this.
>
> So I've actually installed with some Rawhide update 'zram-generator' package
> to use zram swap to help with memory of Firefox & Thunderbird a bit with this
> 4G RAM laptop. All worked fine. However side effect of usage of ZRAM swapping
> became actually this occasional kernel BUG hitting.
>
> When I've stopped using Zram swap - it now runs for 2 weeks without a
> single deadlock - with single or dual screen monitor setup with many suspends
> & resumes in between.
>
> So I'm likely 100% sure that ZRAM usage is triggering this issue. While I
> know this laptop is old and likely with low memory and so on - no sure if it's
> worth to solve it - maybe good enough solution is to issue a warning user
> should no comibine this old piece with ZRAM - but I'm all open to do some
> testing for fix - although I still don't have a simple triggering path for
> this issue to happen within short period of time.
>
> Maybe driver is missing tomark some pages as pined into memory so ZRAM can't
> swap them out ?.
>
>
> > page:00000000577758b3 refcount:0 mapcount:0 mapping:0000000000000000
> > index:0x1 pfn:0x1192cc
> > flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
> > raw: 0017ffffc0000000 ffffe683c47171c8 ffff8fa3f79377a8 0000000000000000
> > raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
> > page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio))
> > ------------[ cut here ]------------
> > kernel BUG at mm/shmem.c:708!
> > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > CPU: 1 PID: 42896 Comm: Xorg Not tainted 5.19.0-0.rc2.21.fc37.x86_64 #1
> > Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011
> > RIP: 0010:shmem_add_to_page_cache+0x48e/0x500
> > Code: 01 0f 84 0a fc ff ff 48 8d 4a ff 31 d2 48 39 cb 0f 85 ff fb ff ff e9
> > f6 fb ff ff 48 c7 c6 70 01 64 bb 48 89 df e8 f2 99 01 00 <0f> 0b 48 c7 c6 a0
> > 1b 64 bb 48 89 df e8 e1 99 01 00 0f 0b 48 8b 13
> > RSP: 0018:ffff9ce7c047f6b0 EFLAGS: 00010286
> > RAX: 000000000000003f RBX: ffffe683c464b300 RCX: 0000000000000000
> > RDX: 0000000000000001 RSI: ffffffffbb67b8e8 RDI: 00000000ffffffff
> > RBP: 0000000000023f97 R08: ffffffffbca122a0 R09: 64656b636f6c5f74
> > R10: 747365745f6f696c R11: 6f6621284f494c4f R12: 00000000001120d4
> > R13: ffff8fa2c6ae7890 R14: ffffe683c464b300 R15: 0000000000000001
> > FS: 00007fc1cea31380(0000) GS:ffff8fa3f7900000(0000)
> > knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f6972e228c8 CR3: 0000000104ba8000 CR4: 00000000000006e0
> > Call Trace:
> > <TASK>
> > shmem_swapin_folio+0x274/0x980
> > shmem_getpage_gfp+0x234/0x990
> > shmem_read_mapping_page_gfp+0x36/0xf0
> > shmem_sg_alloc_table+0x11b/0x250 [i915]
Sorry, I never noticed your original report in June.
This is not a bug in zram or i915, but what Matthew fixes in
https://lore.kernel.org/lkml/20220730042518.1264767-1-willy@xxxxxxxxxxxxx/
I am a little surprised to see it hitting i915, since I had thought it
could only affect gma500: but looks like 965gm has similar limitations,
and so I expect that's what's on your laptop there.
Hugh