Re: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping processes during pipe read/write

From: Manfred Spraul
Date: Sun Dec 29 2024 - 14:54:58 EST


Hi Oleg,

On 12/29/24 2:13 PM, Oleg Nesterov wrote:
Sorry for the noise...

and currently this is fine. But if we want to add the wq_has_sleeper()
checks into fs/pipe.c then pipe_poll() needs smp_mb() after it calls
poll_wait().

Agreed?

Yes, agreed.

Just the comment in pipe_poll() was a bit tricky for me.

https://elixir.bootlin.com/linux/v6.12.6/source/fs/pipe.c#L670

If i understand it right, from memory ordering perspective, it is undefined if pipe->head/tail is written before or after updating the linked list in the poll table entry.

But: Reading cannot happen before taking the spinlock.

CPU1:

pipe_poll()-> poll_wait()->__pollwait()->add_wait_queue()->spin_lock()
LOAD(pipe->head, pipe->tail)
<<<<< insert here CPU2
<add to wait queue>
pipe_poll()-> poll_wait()->__pollwait()->add_wait_queue()->spin_unlock()
<use pipe->head, pipe->tail>

CPU2:

pipe_update_tail()
mutex_unlock()
wake_up_interruptible_sync_poll
spin_lock() >>>>> this closes the potential race

perhaps:

    /*
     * .. and only then can you do the racy tests. That way,
     * if something changes and you got it wrong, the poll
     * table entry will wake you up and fix it.

+   * The memory barrier for this READ_ONCE is provided by

+   * poll_wait()->__pollwait()->add_wait_queue()->spin_lock()

+ * It pairs with the spin_lock() in wake_up

     */
    head = READ_ONCE(pipe->head);

On 12/29, Oleg Nesterov wrote:
On 12/29, Manfred Spraul wrote:
I think that your patch (and the original patch from WangYuli) has the same
proble with pipe_poll()->poll_wait()->__pollwait().
What is the memory barrier for pipe_poll()?

There is poll_wait()->__pollwait()->add_wait_queue()->spin_unlock(). thus
only store_release.

And then READ_ONCE(), i.e. no memory barrier.

Thus the CPU would be free to load pipe->head and pipe->tail before adding
the entry to the poll table.

Correct?
Yes, this was my thinking.

See also my initial reply to WangYuli
https://lore.kernel.org/all/20241226160007.GA11118@xxxxxxxxxx/

Do you agree?

Oleg.