Re: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping processes during pipe read/write
From: Manfred Spraul
Date: Sun Dec 29 2024 - 14:54:58 EST
Hi Oleg,
On 12/29/24 2:13 PM, Oleg Nesterov wrote:
Sorry for the noise...
and currently this is fine. But if we want to add the wq_has_sleeper()
checks into fs/pipe.c then pipe_poll() needs smp_mb() after it calls
poll_wait().
Agreed?
Yes, agreed.
Just the comment in pipe_poll() was a bit tricky for me.
https://elixir.bootlin.com/linux/v6.12.6/source/fs/pipe.c#L670
If i understand it right, from memory ordering perspective, it is
undefined if pipe->head/tail is written before or after updating the
linked list in the poll table entry.
But: Reading cannot happen before taking the spinlock.
CPU1:
pipe_poll()-> poll_wait()->__pollwait()->add_wait_queue()->spin_lock()
LOAD(pipe->head, pipe->tail)
<<<<< insert here CPU2
<add to wait queue>
pipe_poll()-> poll_wait()->__pollwait()->add_wait_queue()->spin_unlock()
<use pipe->head, pipe->tail>
CPU2:
pipe_update_tail()
mutex_unlock()
wake_up_interruptible_sync_poll
spin_lock() >>>>> this closes the potential race
perhaps:
/*
* .. and only then can you do the racy tests. That way,
* if something changes and you got it wrong, the poll
* table entry will wake you up and fix it.
+ * The memory barrier for this READ_ONCE is provided by
+ * poll_wait()->__pollwait()->add_wait_queue()->spin_lock()
+ * It pairs with the spin_lock() in wake_up
*/
head = READ_ONCE(pipe->head);
On 12/29, Oleg Nesterov wrote:
On 12/29, Manfred Spraul wrote:
I think that your patch (and the original patch from WangYuli) has the same
proble with pipe_poll()->poll_wait()->__pollwait().
What is the memory barrier for pipe_poll()?
There is poll_wait()->__pollwait()->add_wait_queue()->spin_unlock(). thus
only store_release.
And then READ_ONCE(), i.e. no memory barrier.
Thus the CPU would be free to load pipe->head and pipe->tail before adding
the entry to the poll table.
Correct?
Yes, this was my thinking.
See also my initial reply to WangYuli
https://lore.kernel.org/all/20241226160007.GA11118@xxxxxxxxxx/
Do you agree?
Oleg.