Re: [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2]

From: Linus Torvalds
Date: Sat Dec 07 2019 - 01:47:46 EST


On Fri, Dec 6, 2019 at 4:00 PM Johannes Hirte
<johannes.hirte@xxxxxxxxxxxxx> wrote:
>
> Tested with 5.4.0-11505-g347f56fb3890 and still the same wrong behavior.
> Reliable testcase is facebook, where timeline isn't updated with firefox.

Hmm. I'm not on FB, so that's not a great test for me.

But I've been staring at the code for a long time, and I did find another issue.

poll() and select() were subtly racy and broken. The code did

unsigned int head = READ_ONCE(pipe->head);
unsigned int tail = READ_ONCE(pipe->tail);

which is ok in theory - select and poll can be racy, and doing racy
reads is ok and we do it in other places too.

But when you don't do proper locking and do racy poll/select, you need
to make sure that *if* you were wrong, and said "there's nothing
pending", you need to have added yourself to the wait-queue so that
any changes caused poll to update.

And the new pipe code did that wrong. It added itself to the poll wait
queues *after* it had read that racy data, so you could get into a
race where

- poll reads stale data

- data changes, wakeup happens

- poll adds itself to the poll wait queue after the wakeup

- poll returns "nothing to read/write" based on stale data, and never
saw the wakeup event that told it otherwise.

So a patch something like the appended (whitespace-damaged once again,
because it's untested and I've only been _looking_ a the code) might
solve that issue.

That said, the race here is quite small. Since that firefox problem is
apparently repeatable for you, the timing is either _very_ unlucky, or
there is something else going on too.

Linus

--- snip snip ---

diff --git a/fs/pipe.c b/fs/pipe.c
index c561f7f5e902..4c39ea9b3419 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -557,12 +557,24 @@ pipe_poll(struct file *filp, poll_table *wait)
{
__poll_t mask;
struct pipe_inode_info *pipe = filp->private_data;
- unsigned int head = READ_ONCE(pipe->head);
- unsigned int tail = READ_ONCE(pipe->tail);
+ unsigned int head, tail;

+ /*
+ * Reading only -- no need for acquiring the semaphore.
+ *
+ * But because this is racy, the code has to add the
+ * entry to the poll table _first_ ..
+ */
poll_wait(filp, &pipe->wait, wait);

- /* Reading only -- no need for acquiring the semaphore. */
+ /*
+ * .. and only then can you do the racy tests. That way,
+ * if something changes and you got it wrong, the poll
+ * table entry will wake you up and fix it.
+ */
+ head = READ_ONCE(pipe->head);
+ tail = READ_ONCE(pipe->tail);
+
mask = 0;
if (filp->f_mode & FMODE_READ) {
if (!pipe_empty(head, tail))