Re: [PATCH] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits

From: Herton R. Krzesinski
Date: Mon Aug 10 2015 - 11:31:58 EST

Next message: Christoph Lameter: "Re: [PATCH 07/10] sched: Migrate sched to use new tick dependency mask model"
Previous message: Paul E. McKenney: "[PATCH tip/core/rcu] Silence lockdep false positive for expedited grace periods"
In reply to: Manfred Spraul: "Re: [PATCH] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits"
Next in thread: Manfred Spraul: "Re: [PATCH] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, Aug 09, 2015 at 07:49:08PM +0200, Manfred Spraul wrote:
> Hi Herton,
>
(snip)
> >+++ b/ipc/sem.c
> >@@ -2074,17 +2074,24 @@ void exit_sem(struct task_struct *tsk)
> > rcu_read_lock();
> > un = list_entry_rcu(ulp->list_proc.next,
> > struct sem_undo, list_proc);
> >- if (&un->list_proc == &ulp->list_proc)
> >- semid = -1;
> >- else
> >- semid = un->semid;
> >+ if (&un->list_proc == &ulp->list_proc) {
> >+ rcu_read_unlock();
> >+ /* Make sure we wait for any place still referencing
> >+ * the current ulp to finish */
> >+ synchronize_rcu();
> Sorry, no. synchronize_rcu() is a high-latency operation.
> We can't call it within exit_sem(). We could use kfree_rcu(), but I don't
> see that we need it:
>
> Which race do you imagine?
> ulp is accessed by:
> - freeary(). Race impossible due to explicit locking.
> - exit_sem(). Race impossible due to ulp->refcount
> - find_alloc_undo(). Race impossible, because it operates on
> current->sysvsem.undo_list.
> "current" is in do_exit, thus can't be inside semtimedop().

Well without the synchronize_rcu() and with the semid list loop fix I was still
able to get issues, and I thought the problem is related to racing with IPC_RMID
on freeary again. This is one scenario I would imagine:

A B

freeary()
list_del(&un->list_id)
spin_lock(&un->ulp->lock)
un->semid = -1
list_del_rcu(&un->list_proc)
__list_del_entry(&un->list_proc)
__list_del(entry->prev, entry->next) exit_sem()
next->prev = prev; ...
prev->next = next; ...
... un = list_entry_rcu(ulp->list_proc.next...)
(&un->list_proc)->prev = LIST_POISON2 if (&un->list_proc == &ulp->list_proc) <true, last un removed by thread A>
... kfree(ulp)
spin_unlock(&un->ulp->lock) <---- bug

Now that is a very tight window, but I had problems even when I tried this patch
first:

(...)
- if (&un->list_proc == &ulp->list_proc)
- semid = -1;
- else
- semid = un->semid;
+ if (&un->list_proc == &ulp->list_proc) {
+ rcu_read_unlock();
+ break;
+ }
+ spin_lock(&ulp->lock);
+ semid = un->semid;
+ spin_unlock(&ulp->lock);

+ /* exit_sem raced with IPC_RMID, nothing to do */
if (semid == -1) {
rcu_read_unlock();
- break;
+ synchronize_rcu();
+ continue;
}
(...)

So even with the bad/uneeded synchronize_rcu() which I had placed there, I could
still get issues (however the testing on patch above was on an older kernel than
latest upstream, from RHEL 6, I can test without synchronize_rcu() on latest
upstream, however the affected code is the same). That's when I thought of
scenario above. I was able to get this oops:

<0>BUG: spinlock bad magic on CPU#3, test/5427 (Not tainted)
<4>general protection fault: 0000 [#1] SMP
...
<4>Pid: 5427, comm: test Not tainted 2.6.32-573.el6.1233300.2.x86_64.debug #1 QEMU Standard PC (i440FX + PIIX, 1996)
<4>RIP: 0010:[<ffffffff812c8291>] [<ffffffff812c8291>] spin_bug+0x81/0x100
...
<4>Call Trace:
<4> [<ffffffff812c8375>] _raw_spin_unlock+0x65/0xa0
<4> [<ffffffff8157087b>] _spin_unlock+0x2b/0x40
<4> [<ffffffff81241a66>] freeary+0x96/0x250
<4> [<ffffffff81570a72>] ? _spin_lock+0x62/0x70
<4> [<ffffffff81241cad>] semctl_down.clone.0+0x8d/0x130
<4> [<ffffffff810af1a8>] ? sched_clock_cpu+0xb8/0x110
<4> [<ffffffff810bdb3d>] ? trace_hardirqs_off+0xd/0x10
<4> [<ffffffff810af2ef>] ? cpu_clock+0x6f/0x80
<4> [<ffffffff810c0c5d>] ? lock_release_holdtime+0x3d/0x190
<4> [<ffffffff81243a42>] sys_semctl+0x2a2/0x300
<4> [<ffffffff815702d2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
<4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b

May be below spin_lock(ulp->lock) could be acquired earlier instead before
checking list_proc being empty? Then this would avoid the need for
synchronize_rcu(), which was my initial idea to avoid the problem.

>
> >+ break;
> >+ }
> >+ spin_lock(&ulp->lock);
> >+ semid = un->semid;
> >+ spin_unlock(&ulp->lock);
> Ok/good.
> Note (I've tried it first):
> Just "READ_ONCE(un->semid)" would be insufficient, because this can happen:
> A: thread 1, within freeary:
> A: spin_lock(&ulp->lock);
> A: un->semid = -1;
> B: thread 2, within exit_sem():
> B: if (un->semid == -1) exit;
> B: kfree(ulp);
> A: spin_unlock(&ulp->lock); <<<< use-after-free, bug
>
> >+ /* exit_sem raced with IPC_RMID, nothing to do */
> > if (semid == -1) {
> > rcu_read_unlock();
> >- break;
> >+ continue;
> > }
> >- sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, un->semid);
> >+ sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, semid);
> > /* exit_sem raced with IPC_RMID, nothing to do */
> > if (IS_ERR(sma)) {
> > rcu_read_unlock();
> Ok.
> >@@ -2112,9 +2119,10 @@ void exit_sem(struct task_struct *tsk)
> > ipc_assert_locked_object(&sma->sem_perm);
> > list_del(&un->list_id);
> >- spin_lock(&ulp->lock);
> >+ /* we should be the last process using this ulp, so no need
> >+ * to acquire ulp->lock here; we are also protected against
> >+ * IPC_RMID as we hold sma->sem_perm.lock */
> > list_del_rcu(&un->list_proc);
> >- spin_unlock(&ulp->lock);
> > /* perform adjustments registered in un */
> > for (i = 0; i < sma->sem_nsems; i++) {
> a) "we should be the last" or "we are the last"?

I'll change the comment, as "we are the last".

> b) The bug that you have found is probably old, thus it must go into the
> stable kernels as well.
> I would not do this change together with the bugfix.
>
> Perhaps make two patches? One cc stable, the other one without cc stable.

Ok, I'll separate the changes.

Thanks for reviewing.

> --
> Manfred

--
[]'s
Herton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Christoph Lameter: "Re: [PATCH 07/10] sched: Migrate sched to use new tick dependency mask model"
Previous message: Paul E. McKenney: "[PATCH tip/core/rcu] Silence lockdep false positive for expedited grace periods"
In reply to: Manfred Spraul: "Re: [PATCH] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits"
Next in thread: Manfred Spraul: "Re: [PATCH] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]