Re: [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifierinterface

From: Gregory Haskins
Date: Wed Jun 17 2009 - 13:29:23 EST


Hi Davide,

Davide Libenzi wrote:
> On Tue, 16 Jun 2009, Gregory Haskins wrote:
>
>
>> Davide Libenzi wrote:
>>
>>> On Tue, 16 Jun 2009, Gregory Haskins wrote:
>>>
>>>
>>>
>>>> Does this all make sense?
>>>>
>>>>
>>> This conversation has been *really* long, and I haven't had time to look
>>> at the patch yet. But looking at the amount of changes, and the amount of
>>> even more changes talked in this thread, there's a very slim chance that
>>> I'll ACK the eventfd code.
>>> You may want to consider a solution that does not litter eventfd code that
>>> much.
>>>
>>>
>>> - Davide
>>>
>>>
>>>
>>>
>> Hi Davide,
>>
>> I understand your position and value your time/insight into looking at
>> this things.
>>
>> Despite the current ongoing discussion, I still stand that the current
>> patch is my proposed solution (though I have yet to convince Michael).
>> But in any case, if you have the time, please look it over because I
>> still think its the right direction to head in.
>>
>
> I don't think so. You basically upload a bunch of stuff it could have been
> inside your irqfd into eventfd.

Can you elaborate? I currently do not see how I could do the proposed
concept inside of irqfd while still using eventfd. Of course, that
would be possible if we fork irqfd from eventfd, and perhaps this is
what you are proposing. As previously stated I don't want to give up on
the prospect of re-using it quite yet, so bear with me. :)

The issue with eventfd, as I see it, is that eventfd uses a
spin_lock_irqsave (by virtue of the wait-queue stuff) across the
"signal" callback (which today is implemented as a wake-up). This
spin_lock implicitly creates a non-preemptible critical section that
occurs independently of whether eventfd_signal() itself is invoked from
a sleepable context or not.

What I strive to achieve is to remove the creation of this internal
critical section. If eventfd_signal() is called from atomic context, so
be it. We will detect this in the callback and be forced to take the
slow-path, and I am ok with that. *But*, if eventfd_signal() (or
f_ops->write(), for that matter) are called from a sleepable context
*and* eventfd doesn't introduce its own critical section (such as with
my srcu patch), we can potentially optimize within the callback by
executing serially instead of deferring (e.g. via a workqueue).

(Note: The issue of changing eventfd_signal interface is a separate
tangent that Michael proposed, and is not something I am advocating. In
the current proposal, eventfd_signal() retains its exact semantics as it
has in mainline).

> Now the eventfd_signal() can magically
> sleep, or not, depending on what the signal functions do. This makes up a
> pretty aweful interface if you ask me.
> A lot simpler and cleaner if eventfd_signal(), like all the wake up
> functions inside the kernel, can be called from atomic context. Always,
> not sometimes.
>

It can! :) This is not changing from whats in mainline today (covered
above).

Thanks Davide,
-Greg


Attachment: signature.asc
Description: OpenPGP digital signature