Re: wait_queue bug ?

Linus Torvalds (torvalds@ev5.linux.s.xgw.fi)
Tue, 19 Nov 1996 11:25:42 +0200 (EET)


On Mon, 18 Nov 1996, Andreas Schultz wrote:
>
> There seems to be a bug in the wait_queue handling. Imagin the following
> situation:
>
> struct {
> ulong test_value;
> struct wait_queue *my_wait_queue;
> } my_struct;
>
> A "sleep_on(&my_struct.my_wait_queue);" overwrites than the value
> of test_value, causing all kind of havoc.

Hmm.. Have you initialized the wait queues with

init_wait_queue(&my_struct.my_wait_queue);

which should set up the wait queue correctly?

> It seems that the bug has been introduced in 2.0.16. Have a look at
> following pice of code from include/linux/sched.h:

It's rather ugly, yes.

> extern inline void __add_wait_queue(struct wait_queue ** p,
> struct wait_queue *wait)

[ obscene code removed: I wrote it, I never want to see it again ;-]

> and from include/linux/wait.h:
>
> #define WAIT_QUEUE_HEAD(x) ((struct wait_queue *)((x)-1))
>
> The assignment of WAIT_QUEUE_HEAD to *next goes to an illegal address !!

The code really does assign a pointer to the "previous" word into the
wait-queue, but that pointer should never be dereferenced directly. It
should always be referenced through the "->next" entry, which will cancel
the effect of the "-1" done in WAIT_QUEUE_HEAD.

Now, I agree that it's not pretty, but if you have a weak stomach you
shouldn't have looked at the implementation ;)

My approach to code cleanliness: the real C code should be readable, but
header files are allowed to hide ugly details if the interface to those
ugly details is clean. In this case the "interface" is

init_wait_queue(&wait_q_pointer)
sleep_on(&wait_q_pointer);
wake_up(&wait_q_pointer);
... etc ...

and the ugly details are just implementation details that you should never
see..

There could certainly be a bug there somewhere, but I don't see it
immediately. I did walk through all this when I wrote it, but right now
I'd rather forget it ;)

Linus