Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcurelated? ]

From: Davidlohr Bueso
Date: Sat Apr 20 2013 - 16:01:02 EST


On Sat, 2013-04-20 at 02:19 +0200, Sedat Dilek wrote:
> On Sat, Apr 20, 2013 at 2:06 AM, Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> > On Sat, Apr 20, 2013 at 1:02 AM, Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >> On Fri, Apr 19, 2013 at 3:55 PM, Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> >>>
> >>> Davidlohr pointed to this patch (tested the triplet):
> >>>
> >>> ipc, sem: do not call sem_lock when bogus sma:
> >>> https://lkml.org/lkml/2013/3/31/12
> >>>
> >>> Is that what you mean?
> >>
> >> Yup.
> >>
> >
> > Davidlohr Bueso (1):
> > ipc, sem: do not call sem_lock when bogus sma
> >
> > Linus Torvalds (1):
> > crazy rcu double free debug hack
> >
> > With ***both*** patches applied I am able to build a Linux-kernel with
> > 4 parallel-make-jobs again.
> > David's or your patch alone are not sufficient!
> >
>
> [ Still both patches applied ]
>
> To correct myself... The 1st run was OK.
>
> The 2nd run shows a NULL-pointer-deref (excerpt):
>
> [ 178.490583] BUG: spinlock bad magic on CPU#1, sh/8066
> [ 178.490595] lock: 0xffff88008b53ea18, .magic: 6b6b6b6b, .owner:
> make/8068, .owner_cpu: 3
> [ 178.490599] BUG: unable to handle kernel NULL pointer dereference
> at (null)
> [ 178.490608] IP: [<ffffffff812bacd0>] update_queue+0x70/0x210
> [ 178.490610] PGD 0
> [ 178.490612] Oops: 0000 [#1] SMP
> ...

The exit_sem() >> do_smart_update() >> update_queue() calls seem pretty
well protected. Furthermore we're asserting that sma->sem_perm.lock is
taken. This could just be a consequence of another issue. Earlier this
week Andrew pointed out a potential race in semctl_main() where
sma->sem_perm.deleted could be changed when cmd == GETALL.

Sedat, could you try the attached patch to keep the ipc lock acquired
(on top of the three patches you're already using) and let us know how
it goes? We could also just have the RCU read lock instead of
->sem.perm.lock for GETALL, but lets play it safe for now.

Thanks,
Davidlohr

diff --git a/ipc/sem.c b/ipc/sem.c
index 5711616..1dfc3c1 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1243,10 +1243,11 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum,
err = -EIDRM;
goto out_free;
}
- sem_unlock(sma, -1);
}

- sem_lock(sma, NULL, -1);
+ /* has the ipc lock already been taken? */
+ if(nsems <= SEMMSL_FAST)
+ sem_lock(sma, NULL, -1);
for (i = 0; i < sma->sem_nsems; i++)
sem_io[i] = sma->sem_base[i].semval;
sem_unlock(sma, -1);