Re: BUG: spinlock bad magic on CPU#0, migration/0/9

From: Davidlohr Bueso
Date: Thu Feb 12 2015 - 12:58:38 EST


On Thu, 2015-02-12 at 18:41 +0100, Oleg Nesterov wrote:
> On 02/12, Oleg Nesterov wrote:
> > On 02/11, Davidlohr Bueso wrote:
> > >
> > > On Wed, 2015-02-11 at 16:34 -0800, Paul E. McKenney wrote:
> > > > Hello!
> > > >
> > > > Did an earlier-than-usual port of v3.21 patches to post-v3.19, and
> > > > hit the following on x86_64. This happened after about 15 minutes of
> > > > rcutorture. In contrast, I have been doing successful 15-hour runs
> > > > on v3.19. I will check reproducibility and try to narrow it down.
> > > > Might this be a duplicate of the bug that Raghavendra posted a fix for?
> > > >
> > > > Anyway, this was on 3e8c04eb1174 (Merge branch 'for-3.20' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata).
> > > >
> > > > [ 837.287011] BUG: spinlock bad magic on CPU#0, migration/0/9
> > > > [ 837.287013] lock: 0xffff88001ea0fe80, .magic: ffffffff, .owner: gÃ<81>ÃÃÃÃ/0, .owner_cpu: -42
> > > > [ 837.287013] CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0+ #1
> > > > [ 837.287013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > > [ 837.287013] ffff88001ea0fe80 ffff88001ea0bc78 ffffffff818f6f4b ffffffff810a5a51
> > > > [ 837.287013] ffffffff81e500e0 ffff88001ea0bc98 ffffffff818f3755 ffff88001ea0fe80
> > > > [ 837.287013] ffffffff81ca4396 ffff88001ea0bcb8 ffffffff818f377b ffff88001ea0fe80
> > > > [ 837.287013] Call Trace:
> > > > [ 837.287013] [<ffffffff818f6f4b>] dump_stack+0x45/0x57
> > > > [ 837.287013] [<ffffffff810a5a51>] ? console_unlock+0x1f1/0x4c0
> > > > [ 837.287013] [<ffffffff818f3755>] spin_dump+0x8b/0x90
> > > > [ 837.287013] [<ffffffff818f377b>] spin_bug+0x21/0x26
> > > > [ 837.287013] [<ffffffff8109923c>] do_raw_spin_unlock+0x5c/0xa0
> > > > [ 837.287013] [<ffffffff81902587>] _raw_spin_unlock_irqrestore+0x27/0x50
> > > > [ 837.287013] [<ffffffff8108f0a1>] complete+0x41/0x50
> > >
> > > We did have some recent changes in completions:
> > >
> > > 7c34e318 (sched/completion: Add lock-free checking of the blocking case)
> > > de30ec47 (sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state)
> > >
> > > The second one being more related (although both appear to make sense).
> > > Perhaps some subtle implication in the completion_done side that
> > > disappeared with the spinlock?
> >
> > At first glance both changes look suspicious.
>
> No, sorry, only the 2nd one.

Yes, we can get away with it for the "try" case, and we recheck the done
condition again after grabbing the lock anyway.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/