Re: v4.20-rc1: list_del corruption on thinkpad x220

From: Joonas Lahtinen
Date: Fri Nov 23 2018 - 03:17:42 EST


Quoting Pavel Machek (2018-11-21 13:54:49)
> Hi!
>
> > > My machine locked hard (thinkpad x220). After reboot, I found this in
> > > syslog:
> > >
> > > Sounds like memory corruption..? Does not sound like easy to debug.
> >
> > Were you doing something GPU intense when you experienced the hard hang?
> >
> > And if so, have you been able to hit the issue more than once? At this
> > point it doesn't look like anything we've hit previously, so would be
> > great to have some more insight into how we could reproduce.
>
> I seen another crash since that, but I don't think it counts at
> "easily reproducible".
>
> I may have been running flightgear at that point. That's fairly GPU intensive.
>
> > There's one similar for nouveau in Bugzilla, but it seems like a genuine
> > memory corruption (1 bit flipped):
> >
> > https://bugs.freedesktop.org/show_bug.cgi?id=84880
> >
> > Any extra information would be of use :)
> >
> > Regards, Joonas
> >
> > PS. Could you open a bug to Bugzilla, it'll help to collect the
> > information in one consolidated place:
> >
> > https://01.org/linuxgraphics/documentation/how-report-bugs
>
> I prefer email... certainly for bugs that can't be reproduced.

By adding it to the Bugzilla it may be recognized by somebody else
who is experiencing a similar issue. Internet points are not deducted
for submitting bugs in good faith, even if they get closed as NOTABUG.

It sounds like you've hit the same signature twice, so it may very well
be reproducible. Does flightgear have some demo mode where you could
leave it running a heavy scene overnight?

Were you running 4.19 kernel previously, distro one or vanilla? A full
dmesg from a boot would be appreciated (from kernel where you didn't
experience issues, and from one where you do).

We actually have a well defined process and personnel to look into the
Bugzilla entries, so it'd still be helpful to have this logged to
Bugzilla.

Regards, Joonas

>
> Best regards,
> Pavel
>
> > > > > ...otoh, it still looks like an addres, so maybe it is "just" race in
> > > GPU drivers?
> > >
> > > Any ideas?
> > > Pavel
> > >
> > > Nov 8 18:35:01 duo CRON[28511]: (root) CMD (command -v debian-sa1 >
> > > /dev/null && debian-sa
> > > 1 1 1)
> > > Nov 8 18:42:57 duo kernel: list_del corruption. prev->next should be
> > > ffff8801742b8178, but
> > > was ffffc9000192fec8
> > > Nov 8 18:42:57 duo kernel: ------------[ cut here ]------------
> > > Nov 8 18:42:57 duo kernel: kernel BUG at
> > > /data/fast/l/k/lib/list_debug.c:53!
> > > Nov 8 18:42:57 duo kernel: invalid opcode: 0000 [#1] SMP PTI
> > > Nov 8 18:42:57 duo kernel: CPU: 2 PID: 1082 Comm: i915/signal:1 Not
> > > tainted 4.20.0-rc1+ #3
> > > Nov 8 18:42:57 duo kernel: Hardware name: LENOVO 42872WU/42872WU,
> > > BIOS 8DET74WW (1.44 ) 03
> > > /13/2018
> > > Nov 8 18:42:57 duo kernel: RIP:
> > > 0010:__list_del_entry_valid+0x8e/0x90
> > > Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48
> > > c7 c7 90 74 5e 85 e8
> > > 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff
> > > <0f> 0b 55 48 89 d0 48
> > > 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48
> > > Nov 8 18:42:57 duo kernel: RSP: 0000:ffffc9000196be78 EFLAGS:
> > > 00210086
> > > Nov 8 18:42:57 duo kernel: RAX: 0000000000000054 RBX:
> > > ffff8801742b8178 RCX: 00000000000000
> > > 00
> > > Nov 8 18:42:57 duo kernel: RDX: 0000000000000000 RSI:
> > > ffff88019e2a53d8 RDI: ffff88019e2a53
> > > d8
> > > Nov 8 18:42:57 duo kernel: RBP: ffffc9000196be78 R08:
> > > ffff880196e2cd10 R09: 00000000000000
> > > 00
> > > Nov 8 18:42:57 duo kernel: R10: 00000000e7684eb9 R11:
> > > 3863656632393101 R12: ffffc9000196be
> > > c8
> > > Nov 8 18:42:57 duo kernel: R13: ffff88019707e000 R14:
> > > ffff8801742b8080 R15: ffffc9000192fd
> > > d0
> > > Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000)
> > > GS:ffff88019e280000(0000) knlGS:000
> > > 0000000000000
> > > Nov 8 18:42:57 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
> > > 0000000080050033
> > > Nov 8 18:42:57 duo kernel: CR2: 00000000ed2bf000 CR3:
> > > 000000000581e001 CR4: 00000000000606a0
> > > Nov 8 18:42:57 duo kernel: Call Trace:
> > > Nov 8 18:42:57 duo kernel: intel_breadcrumbs_signaler+0x162/0x330
> > > Nov 8 18:42:57 duo kernel: kthread+0x116/0x150
> > > Nov 8 18:42:57 duo kernel: ? intel_engine_wakeup+0x40/0x40
> > > Nov 8 18:42:57 duo kernel: ? kthread_park+0x90/0x90
> > > Nov 8 18:42:57 duo kernel: ret_from_fork+0x35/0x40
> > > Nov 8 18:42:57 duo kernel: Modules linked in:
> > > Nov 8 18:42:57 duo kernel: ---[ end trace 2f8da183a56f80f6 ]---
> > > Nov 8 18:42:57 duo kernel: RIP:
> > > 0010:__list_del_entry_valid+0x8e/0x90
> > > Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0
> > > 48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8
> > > 74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48
> > > 39 f2 75 19 48 8b 32 48
> > > Nov 8 18:42:57 duo kernel: RSP: 0000:ffffc9000196be78 EFLAGS:
> > > 00210086
> > > Nov 8 18:42:57 duo kernel: RAX: 0000000000000054 RBX:
> > > ffff8801742b8178 RCX: 0000000000000000
> > > Nov 8 18:42:57 duo kernel: RDX: 0000000000000000 RSI:
> > > ffff88019e2a53d8 RDI: ffff88019e2a53d8
> > > Nov 8 18:42:57 duo kernel: RBP: ffffc9000196be78 R08:
> > > ffff880196e2cd10 R09: 0000000000000000
> > > Nov 8 18:42:57 duo kernel: R10: 00000000e7684eb9 R11:
> > > 3863656632393101 R12: ffffc9000196bec8
> > > Nov 8 18:42:57 duo kernel: R13: ffff88019707e000 R14:
> > > ffff8801742b8080 R15: ffffc9000192fdd0
> > > Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000)
> > > GS:ffff88019e280000(0000) knlGS:0000000000000000
> > > Nov 8 18:42:57 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
> > > 0000000080050033
> > > Nov 8 18:42:57 duo kernel: CR2: 00000000ed2bf000 CR3:
> > > 000000000581e001 CR4: 00000000000606a0
> > >
> > > --
> > > (english) http://www.livejournal.com/~pavelmachek
> > > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html