Re: Why 'wait queues' and not 'channels'

Gerard Roudier (groudier@club-internet.fr)
Tue, 18 May 1999 21:23:42 +0200 (MET DST)


On Mon, 17 May 1999, Malcolm Beattie wrote:

> David S. Miller writes:
> > Date: Sat, 15 May 1999 20:58:17 +0200 (MET DST)
> > From: Gerard Roudier <groudier@club-internet.fr>
> >
> > Linux waits on 'wait queues' but other UNIXes sleep on 'channels'.
> >
> > What is better with wait queues?
> >
> > Believe me when I say that we don't want the traditional sleeping
> > methodology of "UNIX" in any way shape or form.
> >
> > As far as I know, only Plan9 and Linux get the sleeping method correct
> > (someone please correct me if some other OS does things this way as
> > well).
> >
> > Essentially the crucial difference is, under Linux we:
> >
> > add_to_wait_queue
> > while(1) {
> > check event, break if event happened
> > break if signal arrived
> > schedule();
> > }
> > remove_from_wait_queue
> >
> > On first sight, you may think nothing interesting is happening here.
> >
> > Think harder, the issue is that we eliminate completely any races
> > between the adding to wait queue and test of the event. The task
> > himself controls completely his existence on the wait queue.
> >
> > Other systems have a more complex issue about avoiding the race
> > between the test and the sleep (especially on SMP) because they have
> > only a "add to wait queue and schedule()" singular interface.
> >
> > I suggest not changing the wait queue architecture we have, it's done
> > right and cleanly.
>
> Digital UNIX has a two-step approach like Linux although,
> unsurprisingly, it's more complex partly because it provides a
> mixture of Mach-ish and BSD-ish APIs. Unlike Linux, though, it
> does keep a hash of wait queues. The two steps are
> assert_wait(event, interruptible);
> ...
> thread_block();

A hash of wait queues will avoid the wait queues pollution in data
structures and using 1 spinlock per waitqueue will give more fine-grained
synchronisation than a single spin_lock for all wait_queue stuff. (AFAIR,
such a solution is used in Mach)

The thing that seems still UNIXishly not enough is the sole 'event'
parameter. What I want to wait on is some event related to an instance of
something, and so I would expect as first step something like:

may_wait(void *channel, int tag, etc ...)

On the other hand, the Linux mechanism that makes a process remove itself
from the wait queue is perhaps a good thing, so, something like the
following may be more Linux-ishly:) accepted.

may_wait(void *channel, int tag, wait_queue_link *wql, etc...)

> The assert_wait() says the thread is about to block and puts it on
> the wait queue. thread_block() is the equivalent of Linux
> schedule(). I suppose in some ways assert_wait() (in its simplest
> guise) is more like
> current->state = TASK_INTERRUPTIBLE;

Very probably.

> On top of that, though, Digital UNIX then has a whole host of
> additions: there are plenty of macros around the basic primitives
> which let you do things like pass in a simple or complex lock to be
> locked/unlocked where necessary, you can specify timeouts, you can
> send a result along with a wakeup to say why you woke the target up
> (thread_wakeup_with_result), you can do thread_wakeup to wakeup all
> threads or thread_wakeup_one to wake one, you can even clear the
> wait condition of another thread explicitly (plus all this has to
> cope with swapped out task structures). There are macros to support
> the BSDish API with tsleep, mpsleep as well as the Machish ones.

The tlseep()/wakeup() thing needs the spl() to be implemented and seems
just loose for SMP, IMO.

> In brief, the two-step assert/block is like Linux but the complex
> additional APIs certainly isn't.

The 2 steps solution is easily implementable in Linux, using an array of
Linuxish wait queues (and adding some fields in the task structure). BTW,
It leads to constructs that looks like the ones we use to see in the Linux
kernel, minus the data structure pollution by wait queues, but a construct
like the following is perhaps easier to understand, at least for me: (The
name of datatype are not the ones used in the kernel)

wait_queue_link wql;
...
...
may_wait(chan, tag, &wql, flags, ...);
...
...
if (some_condition_is_true)
do_wait(...);
else
donnot_wait(&wql, ...);

Donnot know of Digital UNIX, but the construct you describe does not seem
to address SMP and looks a bit complex when we want to test the condition
atomically.

In this situation, something as simple as the following:

lock(&lock);
...
wait_locked(void *chan, int tag, lock_t *lock, ...);

should be enough and far simpler. (wait_locked() would be expected to
unlock the caller's 'lock' at a moment that prevents the caller from
being stuck.

Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/