Re: [regression] oops on heavy compilations ("kernel BUG at mm/zswap.c:1005!" and "Oops: invalid opcode: 0000")

From: Yosry Ahmed
Date: Tue Aug 27 2024 - 14:49:29 EST


On Sun, Aug 25, 2024 at 9:24 AM Piotr Oniszczuk
<piotr.oniszczuk@xxxxxxxxx> wrote:
>
>
>
> > Wiadomość napisana przez Pedro Falcato <pedro.falcato@xxxxxxxxx> w dniu 25.08.2024, o godz. 17:05:
> >
> > Also, could you try a memtest86 on your machine, to shake out potential hardware problems?
>
>
> I found less time consuming way to trigger issue: 12c24t cross compile of llvm with „only 16G” of ram - as this triggers many heavy swappings (top swap usage gets 8-9G out of 16G swap part)
>
> With such setup - on 6.9.12 - i’m getting not available system (due cpu soft lockup) just in 1..3h
> (usually first or second compile iteration; i wrote simple scrip compiling in loop + counting interations)

Are we sure that the soft lockup problem is related to the originally
reported problem? It seems like in v6.10 you hit a BUG in zswap
(corruption?), and in v6.9 you hit a soft lockup with a zswap lock
showing up in the splat. Not sure how they are relevant.

Is the soft lockup reproducible in v6.10 as well?

Since you have a narrow window (6.8.2 to 6.9) and a reproducer for the
soft lockup problem, can you try bisecting?

Thanks!