Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifierrace conditions

From: Gregory Haskins
Date: Fri Jun 19 2009 - 17:17:29 EST


Davide Libenzi wrote:
> On Fri, 19 Jun 2009, Gregory Haskins wrote:
>
>
>> eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
>> notifier->release() callback. This lets notification clients know if
>> the eventfd is about to go away and is very useful particularly for
>> in-kernel clients. However, as it stands today it is not possible to
>> use the notification API in a race-free way. This patch adds some
>> additional logic to the notification subsystem to rectify this problem.
>>
>> Background:
>> -----------------------
>> Eventfd currently only has one reference count mechanism: fget/fput. This
>> in of itself is normally fine. However, if a client expects to be
>> notified if the eventfd is closed, it cannot hold a fget() reference
>> itself or the underlying f_ops->release() callback will never be invoked
>> by VFS. Therefore we have this somewhat unusual situation where we may
>> hold a pointer to an eventfd object (by virtue of having a waiter registered
>> in its wait-queue), but no reference. This makes it nearly impossible to
>> design a mutual decoupling algorithm: you cannot unhook one side from the
>> other (or vice versa) without racing.
>>
>
> And why is that?
>
> struct xxx {
> struct mutex mtx;
> struct file *file;
> ...
> };
>
> struct file *xxx_get_file(struct xxx *x) {
> struct file *file;
>
> mutex_lock(&x->mtx);
> file = x->file;
> if (!file)
> mutex_unlock(&x->mtx);
> return file;
> }
>
> void xxx_release_file(struct xxx *x) {
> mutex_unlock(&x->mtx);
> }
>
> void handle_POLLHUP(struct xxx *x) {
> struct file *file;
>
> file = xxx_get_file(x);
> if (file) {
> unhook_waitqueue(file, ...);
> x->file = NULL;
> xxx_release_file(x);
> }
> }
>
>
> Every time you need to "use" file, you call xxx_get_file(), and if you get
> NULL, it means it's gone and you handle it accordigly to your IRQ fd
> policies. As soon as you done with the file, you call xxx_release_file().
> Replace "mtx" with the lock that fits your needs.
>

Consider what would happen if the f_ops->release() was preempted inside
the wake_up_locked_polled() after it dereferenced the xxx from the list,
but before it calls the callback(POLLHUP). The xxx object, and/or the
.text for the xxx object may be long gone by the time it comes back
around. Afaict, there is no way to guard against that scenario unless
you do something like 2/3+3/3. Or am I missing something?

-Greg


Attachment: signature.asc
Description: OpenPGP digital signature