Re: i915: crash with 5.19-rc2

From: Hugh Dickins
Date: Wed Aug 10 2022 - 11:57:15 EST


On Wed, 10 Aug 2022, Zdenek Kabelac wrote:
> Dne 22. 06. 22 v 13:18 Zdenek Kabelac napsal(a):
> > Hello
> >
> > While somewhat oldish hw (T61, 4G, C2D) - I've now witnessed new crash with
> > Xorg:
> >
> > (happened while reopening iconified Firefox window  - running 'standard'
> > rawhide -nodebug kernel 5.19.0-0.rc2.21.fc37.x86_64)
> >
>
> Hello
>
>
> Ok, I think I now know what is behind this BUG/crash of intel graphics  - 
> interestingly it took me a few weeks to realize this.
>
> So I've actually installed with some Rawhide update 'zram-generator' package
> to use  zram swap to help with memory of Firefox & Thunderbird a bit with this
> 4G RAM laptop. All worked fine. However side effect of usage of ZRAM swapping
> became actually this occasional  kernel BUG hitting.
>
> When I've stopped using  Zram swap  -  it now runs for 2 weeks without a
> single deadlock - with single or dual screen monitor setup with many suspends
> & resumes in between.
>
> So I'm likely 100% sure that   ZRAM usage is triggering this issue.   While I
> know this laptop is old and likely with low memory and so on - no sure if it's
> worth to solve it - maybe good enough solution is to issue a warning user
> should no comibine this old piece with ZRAM - but I'm all open to do some
> testing for fix - although I still don't have a simple triggering path for
> this issue to happen within short period of time.
>
> Maybe driver is missing tomark some pages as pined into memory so ZRAM can't
> swap them out ?.
>
>
> >  page:00000000577758b3 refcount:0 mapcount:0 mapping:0000000000000000
> > index:0x1 pfn:0x1192cc
> >  flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
> >  raw: 0017ffffc0000000 ffffe683c47171c8 ffff8fa3f79377a8 0000000000000000
> >  raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
> >  page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio))
> >  ------------[ cut here ]------------
> >  kernel BUG at mm/shmem.c:708!
> >  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> >  CPU: 1 PID: 42896 Comm: Xorg Not tainted 5.19.0-0.rc2.21.fc37.x86_64 #1
> >  Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011
> >  RIP: 0010:shmem_add_to_page_cache+0x48e/0x500
> >  Code: 01 0f 84 0a fc ff ff 48 8d 4a ff 31 d2 48 39 cb 0f 85 ff fb ff ff e9
> > f6 fb ff ff 48 c7 c6 70 01 64 bb 48 89 df e8 f2 99 01 00 <0f> 0b 48 c7 c6 a0
> > 1b 64 bb 48 89 df e8 e1 99 01 00 0f 0b 48 8b 13
> >  RSP: 0018:ffff9ce7c047f6b0 EFLAGS: 00010286
> >  RAX: 000000000000003f RBX: ffffe683c464b300 RCX: 0000000000000000
> >  RDX: 0000000000000001 RSI: ffffffffbb67b8e8 RDI: 00000000ffffffff
> >  RBP: 0000000000023f97 R08: ffffffffbca122a0 R09: 64656b636f6c5f74
> >  R10: 747365745f6f696c R11: 6f6621284f494c4f R12: 00000000001120d4
> >  R13: ffff8fa2c6ae7890 R14: ffffe683c464b300 R15: 0000000000000001
> >  FS:  00007fc1cea31380(0000) GS:ffff8fa3f7900000(0000)
> > knlGS:0000000000000000
> >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >  CR2: 00007f6972e228c8 CR3: 0000000104ba8000 CR4: 00000000000006e0
> >  Call Trace:
> >  <TASK>
> >  shmem_swapin_folio+0x274/0x980
> >  shmem_getpage_gfp+0x234/0x990
> >  shmem_read_mapping_page_gfp+0x36/0xf0
> >  shmem_sg_alloc_table+0x11b/0x250 [i915]

Sorry, I never noticed your original report in June.

This is not a bug in zram or i915, but what Matthew fixes in
https://lore.kernel.org/lkml/20220730042518.1264767-1-willy@xxxxxxxxxxxxx/

I am a little surprised to see it hitting i915, since I had thought it
could only affect gma500: but looks like 965gm has similar limitations,
and so I expect that's what's on your laptop there.

Hugh