Re: safety of *mutex_unlock() (Was: [BUG] signal: sighand unprotected when accessed by /proc)

From: Paul E. McKenney
Date: Tue Jun 10 2014 - 12:15:55 EST

Next message: Frederic Weisbecker: "Re: [PATCH 0/6] nohz: Move nohz kick out of scheduler IPI, v8"
Previous message: Theodore Ts'o: "Re: [PATCH] random: mix all saved registers into entropy pool"
In reply to: Linus Torvalds: "Re: safety of *mutex_unlock() (Was: [BUG] signal: sighand unprotected when accessed by /proc)"
Next in thread: Oleg Nesterov: "Re: safety of *mutex_unlock() (Was: [BUG] signal: sighand unprotected when accessed by /proc)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Jun 10, 2014 at 08:35:53AM -0700, Linus Torvalds wrote:
> On Tue, Jun 10, 2014 at 5:56 AM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > So to safely free a structure containing a mutex, is there some better
> > approach than the following?
>
> Yes.
>
> A data structure *containing* a mutex is fine. The problem only occurs
> when that mutex also acts the lock for the data structure. As a
> result, there are two fixes for the "locks are not one single atomic
> field":
>
> (a) don't do self-locking objects
> (b) use spinlocks for these "self-locking" objects
>
> And quite frankly, (a) is the normal "solution" to the problem.
>
> The fact is, having the data structure contain its own lifetime lock
> is unusual, and generally broken. The *normal* sequence for freeing
> something should be that the last access to it is the atomic referenc
> count access:
>
> .. do whatever, including "unlock(&mem->lock)" ..
> if (atomic_dec_and_test(&mem->refcount))
> .. we can now free it ..
>
> and that's safe. It doesn't matter if "mem" had a mutex in it, it
> doesn't matter if you walked around three times widdershins with a
> dead chicken around your neck. You can do whatever you want, the above
> is fine (and you shouldn't even need to worry about CPU memory
> ordering, because the alloc/free had better have the barriers to make
> sure) that nothing can leak from a free to the next allocation,
> although that is another discussion perhaps worth having).
>
> The whole notion of having the lock that protects the lifetime of the
> data structure inside the structure itself is pretty crazy. Because
> while it is true that we sometimes have the refcount be non-atomic,
> and instead protected with a lock, that lock is generally always
> *outside* the object itself, because you want the lock for lookup etc.
> So having the lock _and_ the refcount be inside the object really is
> crazy.

Agreed! You need to have something external to the object that holds the
object down long enough to safely acquire the object's lock, refcount,
or whatever. Contention usually rules out that "something" being a
global lock or refcount, but per-CPU locks, per-CPU refcounts, RCU,
and so on can work.

> That said, "crazy" has happened. We do occasionally do it. It's
> generally a mistake (the last example of this was the pipe thing), but
> sometimes we do it on purpose (the dcache, for example). You can do
> lookups without holding a lock (generally using RCU), and taking the
> lock and incrementing a refcount, but then the lock has to be a
> spinlock *anyway*, so that's ok.

And of course you have to acquire the lock before doing your
rcu_read_unlock() in that case.

> The last case where we actually had this bug (the afore-mentioned pipe
> thing), the mutex lock wasn't even needed - we had a real spinlock
> protecting the reference count. The bug was that we did the mutex
> unlock *after* the spinlock, for no good reason.
>
> So it really isn't normally a problem. The RT spinlock conversion
> people need to be aware of it, but normally you can't even screw this
> up.

And yes, I might need to add in a reference-count handoff around RCU's
use of rt_mutex. Haven't yet come up with a reasonable way to use RCU
at that point instead. I suppose I could use SRCU, but... ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Frederic Weisbecker: "Re: [PATCH 0/6] nohz: Move nohz kick out of scheduler IPI, v8"
Previous message: Theodore Ts'o: "Re: [PATCH] random: mix all saved registers into entropy pool"
In reply to: Linus Torvalds: "Re: safety of *mutex_unlock() (Was: [BUG] signal: sighand unprotected when accessed by /proc)"
Next in thread: Oleg Nesterov: "Re: safety of *mutex_unlock() (Was: [BUG] signal: sighand unprotected when accessed by /proc)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]