Re: (bisected) Lock up on sh73a0/kzm9g on cpuidle initialization
From: Paul E. McKenney
Date: Tue Nov 25 2014 - 13:02:13 EST
On Tue, Nov 25, 2014 at 06:49:16PM +0100, Geert Uytterhoeven wrote:
> On Fri, Nov 7, 2014 at 8:59 AM, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > On Thu, Nov 6, 2014 at 10:02 PM, Daniel Lezcano
> > <daniel.lezcano@xxxxxxxxxx> wrote:
> >> On 11/06/2014 09:38 PM, Geert Uytterhoeven wrote:
> >>> When CONFIG_CPU_IDLE=y, the kernel locks up during cpuidle initialization
> >>> on Renesas sh73a0/kzm9g-reference, which has a dual-core Cortex-A9.
> >>>
> >>> Last message is:
> >>>
> >>> DMA: preallocated 256 KiB pool for atomic coherent allocations
> >>>
> >>> After this it's supposed to print:
> >>>
> >>> cpuidle: using governor ladder
> >>> cpuidle: using governor menu
> >>>
> >>> I've bisected this to commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc
> >>> ("sched: Let the scheduler see CPU idle states").
> >>>
> >>> Reverting that commit, and commit 83a0a96a5f26d974580fd7251043ff70c8f1823d
> >>> ("sched/fair: Leverage the idle state info when choosing the "idlest"
> >>> cpu") which
> >>> depends on it, fixes the problem.
> >>>
> >>> I saw the discussion "lockdep splat in CPU hotplug", so I enabled lockdep
> >>> debugging, but didn't see a lockdep splat.
> >>
> >> Did you try the fix attached ?
> >>
> >> https://lkml.org/lkml/2014/10/22/722
> >
> > Thanks, I didn't try that.
> >
> > However, this patch seems to be in v3.18-rc3, so I'm already using it.
> > Hence it doesn't fix the problem for me.
> >
> > On another board, with a dual Cortex-A15, the problem doesn't show up.
>
> This problem (regression introduced in v3.18-rc1) is still present in v3.18-rc6.
>
> I did some more investigations, and it's hanging in the call to
> synchronize_rcu() in cpuidle_uninstall_idle_handler(), which was added in
> commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc.
> More specificailly, it's blocked on the wait_for_completion(&rcu.completion)
> in kernel/rcu/update.c:void wait_rcu_gp(call_rcu_func_t crf).
You didn't disable RCU CPU stall warnings, did you? If you did, please
re-enable them, as the stall warning messages will likely help to debug
this. The soft-lockup checks can also be quite valuable.
If you haven't run with CONFIG_PROVE_RCU=y, please try that. For example,
if you have CONFIG_PREEMPT=y and you do synchronize_rcu() from within
an RCU read-side critical section (don't do that, it will hang!!!),
then you will get a lockdep splat.
Does any sort of system activity (keyboard, network, etc.) unstick the
system?
If you have tried all those things without good effect, could you please
send along your .config and an alt-sysrq-t dump of all tasks' stacks?
Thanx, Paul
> Anyone with a clue?
>
> Thanks again!
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/