Re: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping processes during pipe read/write

From: Manfred Spraul
Date: Sat Dec 28 2024 - 13:54:15 EST


Hi Oleg,

On 12/28/24 5:32 PM, Oleg Nesterov wrote:
On 12/28, Oleg Nesterov wrote:
If nothing else, consider

int CONDITION;
wait_queue_head_t WQ;

void wake(void)
{
CONDITION = 1;
wake_up(WQ);
}

void wait(void)
{
DEFINE_WAIT_FUNC(entry, woken_wake_function);

add_wait_queue(WQ, entry);
if (!CONDITION)
wait_woken(entry, ...);
remove_wait_queue(WQ, entry);
}

this code is correct even if LOAD(CONDITION) can leak into the critical
section in add_wait_queue(), so CPU running wait() can actually do

// add_wait_queue
spin_lock(WQ->lock);
LOAD(CONDITION); // false!
list_add(entry, head);
spin_unlock(WQ->lock);

if (!false) // result of the LOAD above
wait_woken(entry, ...);

Now suppose that another CPU executes wake() between LOAD(CONDITION)
and list_add(entry, head). With your patch wait() will miss the event.
The same for __pollwait(), I think...

No?
Even simpler,

void wait(void)
{
DEFINE_WAIT(entry);

__set_current_state(XXX);
add_wait_queue(WQ, entry);

if (!CONDITION)
schedule();

remove_wait_queue(WQ, entry);
__set_current_state(TASK_RUNNING);
}

This code is ugly but currently correct unless I am totally confused.

It is a chance of the add_wait_queue() path, thus impact on all calls.

With (busybox) "find /sys /proc | grep aaaabbbccc", I've seen 16385 wakeup calls with empty queue, and just 6 with an entry in the queue.

But on other workloads, the ratio was more something like 75% empty/25%with entries.

I.e.: We would have long discussions if the change only helps for some usecases, and might have negative impact on other use cases.


And: Your proposal is in conflict with

https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/kernel/fork.c?h=v2.6.0&id=e220fdf7a39b54a758f4102bdd9d0d5706aa32a7

But I do not see the issue, the worst possible scenario should be something like:

// add_wait_queue
spin_lock(WQ->lock);
LOAD(CONDITION); // false!
list_add(entry, head);
STORE(current_state)
spin_unlock(WQ->lock);


--

    Manfred