Re: [PATCH] pipe_read: don't wake up the writer if the pipe is still full

From: Rasmus Villemoes
Date: Thu Mar 06 2025 - 03:35:31 EST


On Wed, Mar 05 2025, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Mon, 3 Mar 2025 at 10:46, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> ENTIRELY UNTESTED, but it seems to generate ok code. It might even
>> generate better code than what we have now.
>
> Bah. This patch - which is now committed - was actually completely broken.
>
> And the reason that complete breakage didn't show up in testing is
> that I suspect nobody really tested or thought about the 32-bit case.
>
> That whole "use 16-bit indexes on 32-bit" is all fine and well, but I
> woke up in the middle of the night and realized that it doesn't
> actually work.
>
> Because now "pipe_occupancy()" is getting *entirely* the wrong
> answers. It just does
>
> return head - tail;
>
> but that only worked when the arithmetic was done modulo the size of
> the indexes. And now it isn't.
>
> So I still haven't *tested* this, but at an absolute minimum, we need
> something like this:
>
> --- a/include/linux/pipe_fs_i.h
> +++ b/include/linux/pipe_fs_i.h
> @@ -192,7 +192,7 @@
> */
> static inline unsigned int pipe_occupancy(unsigned int head,
> unsigned int tail)
> {
> - return head - tail;
> + return (pipe_index_t)(head - tail);
> }
>
> /**
>
> and there might be other cases where the pipe_index_t size might matter.

Yeah, for example

unsigned int count, head, tail, mask;

case FIONREAD:
mutex_lock(&pipe->mutex);
count = 0;
head = pipe->head;
tail = pipe->tail;
mask = pipe->ring_size - 1;

while (tail != head) {
count += pipe->bufs[tail & mask].len;
tail++;
}
mutex_unlock(&pipe->mutex);

If head has already wrapped around, say it's 0 or 1, and tail is close to
65535, that loop is gonna take forever and of course produce the wrong
result.

So yes, there are probably a lot more of these lurking.

There are probably not many tests that stuff 2^28 bytes through a pipe
to try to trigger such corner cases. Perhaps we can help whatever
automated tests are being done by initializing head and tail to
something like (pipe_index_t)-2 when the pipe is created?

Rasmus