Re: [rfc] "fair" rw spinlocks

From: Eric W. Biederman
Date: Mon Dec 07 2009 - 21:12:14 EST

Next message: Hidetoshi Seto: "Re: [PATCH] x86/mce: timer must be setup unconditionally"
Previous message: john stultz: "Re: timer interrupt stucks using tickless kernel"
In reply to: Paul E. McKenney: "Re: [rfc] "fair" rw spinlocks"
Next in thread: Paul E. McKenney: "Re: [rfc] "fair" rw spinlocks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes:

> On Mon, Dec 07, 2009 at 03:19:59PM -0800, Eric W. Biederman wrote:
>> Andi Kleen <andi@xxxxxxxxxxxxxx> writes:
>>
>> > ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:
>> >
>> >> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes:
>> >>>
>> >>> Is it required that all of the processes see the signal before the
>> >>> corresponding interrupt handler returns? (My guess is "no", which
>> >>> enables a trick or two, but thought I should ask.)
>> >>
>> >> Not that I recall. I think it is just an I/O completed signal.
>> >
>> > Wasn't there the sysrq SAK too? That one definitely would need
>> > to be careful about synchronicity.
>>
>> SAK from sysrq is done through schedule work, I seem to recall the
>> locking being impossible otherwise. There is also send_sig_all and a
>> few others from sysrq. I expect we could legitimately make them
>> schedule_work as well if needed.
>
> OK, I will chance it... Here is one possible trick:
>
> o Maintain a list of ongoing group-signal operations, protected
> by some suitable lock. These could be in a per-chain-locked
> hash table, hashed by the signal target (e.g., pgrp).
>
> o When a task is created, it scans the above list, committing
> suicide (or doing whatever the signal requires) if appropriate.
>
> o When creating a child task, the parent holds an SRCU across
> creation. It acquires SRCU before starting creation, and
> releases it when it knows that the child has completed
> scanning the above list.
>
> o The updater does the following:
>
> o Add its request to the above list.
>
> o Wait for an SRCU grace period to elapse.
>
> o Kill off everything currently in the task list,
> and then wait for each such task to get to a point
> where it can be guaranteed not to spawn additional
> tasks. (This might be mediated via a reference
> count in the corresponding list element, or by
> rescanning the task list, or any of a number of
> similar tricks.)
>
> Of course, if the signal is non-fatal, then it is
> necessary only to wait until the child has taken
> the signal.
>
> o If it is possible for a given task's children to
> outlive it, despite the fact that the children must
> commit suicide upon finding themselves indicated by the
> list, wait for another SRCU grace period to elapse.
> (This additional SRCU grace period would be required
> for a non-fatal pgrp signal, for example.)
>
> o Remove the element from the list.
>
> Does this approach make sense, or am I misunderstanding the problem?

I think that is about right. I played with that idea a little bit.
I was thinking of simply having new children return -ERESTARTSYS, and
retry the fork. I put it down because I decided that seems like a
very twisted implementation of a read/write lock.

If we can scale noticeably better a than tasklist_lock it is
definitely worth doing. I think it is really easy to tie yourself up
in pretzels thinking about this.

An srcu in the pid structure that we hold while signaling tasks.
Interesting.

> Either way, one additional question... It seems to me that non-fatal
> signals really don't require the above mechanism, because if a task
> handles the signal, and then spawns a child, one can argue that the
> child came after the signal and should thus be unaffected. Right?
> Or more confusion on my part?

SIGSTOP also seems pretty important not to escape. I'm not certain of
the others. I think I would get a bit upset if job control signals in
the shell stopped working properly. I think asking the question did
that app do something wrong with SIGTERM or did the kernel drop it
would drive me a bit batty.

It is hard to tell what breaks because most buggy implementations will
work correctly most of the time.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Hidetoshi Seto: "Re: [PATCH] x86/mce: timer must be setup unconditionally"
Previous message: john stultz: "Re: timer interrupt stucks using tickless kernel"
In reply to: Paul E. McKenney: "Re: [rfc] "fair" rw spinlocks"
Next in thread: Paul E. McKenney: "Re: [rfc] "fair" rw spinlocks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]