Re: [RFC][PATCH 5/5] signals: Don't hold shared siglock acrosssignal delivery

From: Matt Fleming
Date: Thu Apr 14 2011 - 06:58:01 EST


On Wed, 13 Apr 2011 22:12:19 +0200
Oleg Nesterov <oleg@xxxxxxxxxx> wrote:

> On 04/05, Matt Fleming wrote:
> >
> > To reduce the contention on the shared siglock this patch pushes the
> > responsibility of acquiring and releasing the shared siglock down into
> > the functions that need it. That way, if we don't call a function that
> > needs to be run under the shared siglock, we can run without acquiring
> > it at all.
>
> This adds new races. And this time I do not even understand the intent.
> I mean, it is not clear to me why this change can really help to speed
> up get_signal_to_deliver().

Again, it's not necessarily speeding up get_signal_to_deliver(), but
rather it's reducing the contention on the shared siglock.

For example, without this patch, if you've got someone sending a signal
to a task group, you can't run get_signal_to_deliver() in parallel
because you'll be waiting for the sending thread to release the shared
siglock. Which, if you were going to dequeue a private signal anyway
and didn't need to access signal->shared_pending, is unnecessary
overhead :-(

As it turns out, the shared siglock protects more than just
signal->shared_pending, so in certain cases you need to acquire it
anyway (like the fatal signal code paths) so this isn't as optimised
as it could be, which is a shame.

> > Note that this does not make signal delivery lockless. A signal must
> > still be dequeued from either the shared or private signal
> > queues. However, in the private signal case we can now get by with
> > just acquiring the per-thread siglock
>
> OK, we can dequeue the signal. But dequeue_signal()->recalc_sigpending()
> becomes even more wrong. We do not hold any lock, we can race with both
> shared/private signal sending.

Yep, this was covered in the previous patch review.

> > Also update tracehook.h to indicate it's not called with siglock held
> > anymore.
>
> Heh. This breaks this tracehook completely ;) OK, nobody cares about
> the out-of-tree users, forget.

I was hoping you'd say that ;-)

> Also. get_signal_to_deliver() does
>
> signr = dequeue_signal(current, &current->blocked,
> info);
> ...
>
> ka = &sighand->action[signr-1];
>
> ...
>
> if (ka->sa.sa_handler != SIG_DFL) {
> /* Run the handler. */
> *return_ka = *ka;
>
> This memcpy() can race with sys_rt_sigaction(), we can't read *ka
> atomically.

Eek! I hadn't noticed that. Thanks.

> Actually, even SIG_DFL/SIG_IGN checks can race, although this is minor...
> But still not correct.
>
> if (ka->sa.sa_flags & SA_ONESHOT) {
> write_lock(&sighand->action_lock);
> ka->sa.sa_handler = SIG_DFL;
> write_unlock(&sighand->action_lock);
>
> We should check SA_ONESHOT under ->action_lock. But even then this
> will bw racy, although we can probably ignore this... Suppose that
> SA_ONESHOT was set after we dequeued the signal.

Right, most of this side is wrong wrt to the action_lock.

Thanks Oleg.

--
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/