Re: disabling secondary CPU hangs / system fails to suspend with kernel 4.19+
From: Peter Zijlstra
Date: Fri Mar 15 2019 - 05:09:57 EST
On Thu, Mar 14, 2019 at 04:17:28PM +0100, Thomas Müller wrote:
> Hi,
>
> starting with kernel 4.19 my Lenovo ThinkPad X1 Carbon 5th no longer properly suspends.
>
> This is 100% reproducible and git bisect points to the following commit:
> > [be45bf5395e0886a93fc816bbe41a008ec2e42e2] watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug
> > be45bf5395e0886a93fc816bbe41a008ec2e42e2 is the first bad commit
> > commit be45bf5395e0886a93fc816bbe41a008ec2e42e2
> > Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Date: Fri Jul 13 12:42:08 2018 +0200
> >
> > watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug
> >
> > When scheduling is delayed for longer than the softlockup interrupt
> > period it is possible to double-queue the cpu_stop_work, causing list
> > corruption.
> >
> > Cure this by adding a completion to track the cpu_stop_work's
> > progress.
> >
> > Reported-by: kernel test robot <lkp@xxxxxxxxx>
> > Tested-by: Rong Chen <rong.a.chen@xxxxxxxxx>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work")
> > Link: http://lkml.kernel.org/r/20180713104208.GW2494@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> > Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
> >
> > :040000 040000 6aca2dbb84bc33fe442b18b3d0a135c27adff7b9 2710af12d32e4b98df07768716689b213bce45fc M kernel
>
> The bugzilla reports have some additional details:
> * https://bugzilla.redhat.com/show_bug.cgi?id=1671504
> * https://bugzilla.kernel.org/show_bug.cgi?id=202679
> * https://bugzilla.kernel.org/show_bug.cgi?id=202137
>
> I'm happy to provide additional information or test a patch or two (as long as it doesn't
> eat up my notebook ;))
I obviously cannot reproduce :/ Both cpu-hotplug and suspend works just
fine on my test boxes. I even tried my thinkpad (x240) and that too goes
to sleep and wakes up just fine.
What .config do you have? And what, if anything do you see on the
console when it goes funny?
I think you wrote that hot-un-plug never completes? Is there anything in
dmesg when it's stuck in:
echo 0 > /sys/devices/system/cpu/cpu1/online
?