Re: stop_machine() soft lockup

From: Peter Zijlstra
Date: Wed Sep 05 2018 - 09:14:16 EST


On Wed, Sep 05, 2018 at 01:47:49PM +0200, Niklas Cassel wrote:
> On Wed, Sep 05, 2018 at 10:42:41AM +0200, Peter Zijlstra wrote:
> > On Tue, Sep 04, 2018 at 09:03:22PM +0200, Niklas Cassel wrote:
> > > Hello Peter,
> > >
> > > I'm seeing some lockups when booting linux-next on a db820c arm64 board.
> > > I've tried to analyze, but I'm currently stuck.
> >
> > Please see (should be in your Inbox too):
> >
> > https://lkml.kernel.org/r/20180905084158.GR24124@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> I'm sorry if I mislead you by replying to your other mail thread,
> both of them have timekeeping_notify() in the call trace,
> but my problem has this call trace:
>
> [ 128.747853] wait_for_common+0xe0/0x1a0
> [ 128.752023] wait_for_completionx+0x28/0x38
> [ 128.755677] __stop_cpus+0xd4/0xf8
> [ 128.759837] stop_cpus+0x70/0xa8
> [ 128.762958] stop_machine_cpuslocked+0x124/0x130
> [ 128.766345] stop_machine+0x54/0x70
> [ 128.771373] timekeeping_notify+0x44/0x70
> [ 128.774158] __clocksource_select+0xa8/0x1d8
> [ 128.778605] clocksource_done_booting+0x4c/0x64
> [ 128.782931] do_one_initcall+0x94/0x3f8
> [ 128.786921] kernel_init_freeable+0x47c/0x528
> [ 128.790742] kernel_init+0x18/0x110
> [ 128.795673] ret_from_fork+0x10/0x1c
>
>
> while your other mail thread has this call trace:
>
> * stop_machine()
> * timekeeping_notify()
> * __clocksource_select()
> * clocksource_select()
> * clocksource_watchdog_work()
>
>
> So my problem is not related to the watchdog, I tried your revert anyway,
> but unfortunately my problem persists.

Oh, right, missed that distinction. And this is new?

I'll try and have a look. Lockdep doesn't suggest anything?