Re: Hard lockups using 3.10.0

From: Rolf Eike Beer
Date: Sun Aug 11 2013 - 02:10:10 EST


Peter Zijlstra wrote:
> On Thu, Jul 11, 2013 at 12:07:21PM +0200, Borislav Petkov wrote:
> > On Thu, Jul 11, 2013 at 11:38:37AM +0200, Rolf Eike Beer wrote:
> > > Hi,
> > >
> > > I'm running 3.10.0 (from openSUSE packages) on an "Intel(R) Core(TM)
> > > i7-2600 CPU @ 3.40GHz". I got a hard lockup on one of my CPUs twice,
> > > once with backtrace (see attached image). Graphics is the builtin
> > > Intel, used with X 7.6 and KDE 4.10beta2 (basically current openSUSE
> > > 12.3+KDE).
> > >
> > > I'm not aware that I had done anything special, just "normal" desktop
> > > and
> > > development usage, but no heavy compile work at the moment the lockups
> > > happened.
> >
> > Hmm, I can see commit_creds() doing some rcu pointers assignment and rcu
> > calling into the scheduler which screams about a cpu runqueue of the
> > task we're about to reschedule not being locked. Let's add some more
> > people who should know better.
>
> Ok, for the other people too lazy to bother finding the picture:
>
> http://marc.info/?l=linux-kernel&m=137353587012001&q=p3
>
> So we bug at:
>
> kernel/sched/core.c:519 assert_raw_spin_locked(&task_rq(p)->lock);
>
> and get there through:
>
> resched_task()
> check_preempt_wakeup()
> check_preempt_curr()
> try_to_wake_up()
> autoremove_wake_function()
> __call_rcu_nocb_enqueue()
> __call_rcu()
> commit_creds()
> ____call_usermodehelper()
> ret_from_fork()
>
> That don't make much sense though. Since:
>
> try_to_wake_up()
> ttwu_queue()
> raw_spin_lock(&rq->lock)
> ttwu_do_activate()
> ttwu_do_wakeup()
> check_preempt_curr()
> check_preempt_wakeup()
> resched_task(rq->curr)
> assert_raw_spin_locked(task_rq(p)->lock)
>
> It would somehow mean that 'task_rq(rq->curr) != rq', that's completely
> bonkers, we do after all have rq->lock locked.
>
> I must also say that I've _never_ seen this bug before.

Meanwhile I found that there was a hardware defect on this machine. So if it
does not happen again I will assume that this was caused by this.

Thanks for looking into this.

Eike

Attachment: signature.asc
Description: This is a digitally signed message part.