Re: [patch 14/22] pollfs: pollable futex

From: Davi Arnaut
Date: Wed May 02 2007 - 08:20:54 EST


Ulrich Drepper wrote:
> On 5/1/07, Davi Arnaut <davi@xxxxxxxxxxxxx> wrote:
>> The pollable futex approach is far superior (send and receive events from
>> userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the same time.
>> [...]
>
> You have to explain in detail how these interfaces are supposed to
> work. From first sight and without understanding (all) the it seems
> it's far from useful.

It's basically a asynchronous FUTEX_WAIT with notification delivery
through a file descriptor.

> Pollable futexes are useful, but any solution which gets implemented
> must be sufficiently useful for all the uses we might have.

It's very useful for asynchronous event notification libraries
(libevent, liboop, libivykis, etc) because it integrates nicely with
their (e)poll main loops.

Usage schenario: you have 10 worker threads (and 10 futexes) for disk
i/o (or whatever) and one manager thread which is a state machine
serving many clients (epoll loop).

In this scenario the workers threads have only two possible ways of
notifying the manager thread once a job is done: signals and pipe tricks.

For libraries, signals sux. They dont integrate well with poll() loops,
may have overflow issues (RT), and signal numbers may clash with other
libraries/code. The self-pipe trick waste resources (mostly unused pipe
buffer).

By using pollable futexes, all the manager thread has todo is to
associate each of these futexes with a file descriptor (plfutex) and
epoll() for their completion. Once the futex is signaled, epoll()
returns POLLIN for the file descriptor and the manager thread may
dequeue the notification status from anywhere.


> - the trivial is that you have a futex and you are just interest in
> seeing it change. The same as FUTEX_WAIT.

I'm just interested in seeing a FUTEX_WAKE. Yes, same as FUTEX_WAIT.

> I cannot figure out how all this works in your code.

Every futex has a wait queue (q->waiters) which is used to track
processes waiting on the futex. When the futex receives a FUTEX_WAKE it
wakes up all waiters on the wait queue. Also, a futex is considered
woken when it wait queue is empty (or lock_ptr == NULL).

When you register a file descriptor with select(), poll() or epoll() a
callback is queued into the futex wait queue. When the futex receives a
FUTEX_WAKE every callback is called and the event is registered within
each select(), poll() or epoll() table. This initiates a chain reaction
waking up all process sleeping on poll()/whatever.

> Does your read() call (that's the one to wait, yes?) work with O_NONBLOCK
> or how else do you get that behavior?

If the fd is marked O_NONBLOCK and the futex is not woken yet, it simply
returns -EAGAIN (pfs_read_nonblock). If O_NONBLOCK is not set, it waits
synchronously (pfs_read_block/wait_event_interruptible) on the futex
wait queue.

> - more complicated case: I have to wait for multiple futexes and lock
> them all at the same time or don't return at all. This is possible with
> SysV semaphores and generally useful and needed. How can this be
> implemented with your scheme?

Remember, it's only about FUTEX_WAIT.

> - how does it work with PI futexes?

It dosen't work. AFAICS PI futexes don't use FUTEX_WAKE.

> - can I use a futex at the same time through this mechanism and using the normal
> FUTEX_WAIT operation? This is a killer if it's not the case.

Yes.

> - if you have multiple threads polling a futex and the waker wakes up
> one, what happens? It is simply not acceptable to have more than one
> thread return from the poll() call, this would waste too many cycles,
> just to put all threads but one back to sleep.

Only one is waked up (whatever matches first on the futex hashed bucket).

--
Davi Arnaut
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/