Re: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping processes during pipe read/write
From: Kent Overstreet
Date: Wed Dec 25 2024 - 11:32:49 EST
On Wed, Dec 25, 2024 at 05:04:46PM +0100, Mateusz Guzik wrote:
> On Wed, Dec 25, 2024 at 08:53:05AM -0500, Kent Overstreet wrote:
> > On Wed, Dec 25, 2024 at 03:30:05PM +0200, Andy Shevchenko wrote:
> > > Don't you think the Cc list is a bit overloaded?
> >
> > Indeed, my mail server doesn't let me reply-all.
> >
> > > On Wed, Dec 25, 2024 at 05:42:02PM +0800, WangYuli wrote:
> > > > +config PIPE_SKIP_SLEEPER
> > > > + bool "Skip sleeping processes during pipe read/write"
> > > > + default n
> > >
> > > 'n' is the default 'default', no need to have this line.
> >
> > Actually, I'd say to skip the kconfig option for this. Kconfig options
> > that affect the behaviour of core code increase our testing burden, and
> > are another variable to account for when chasing down bugs, and the
> > potential overhead looks negligable.
> >
>
> I agree the behavior should not be guarded by an option. However,
> because of how wq_has_sleeper is implemented (see below) I would argue
> this needs to show how often locking can be avoided in real workloads.
>
> The commit message does state this comes with a slowdown for cases which
> can't avoid wakeups, but as is I thought the submitter just meant an
> extra branch.
>
> > Also, did you look at adding this optimization to wake_up()? No-op
> > wakeups are very common, I think this has wider applicability.
>
> I was going to suggest it myself, but then:
>
> static inline bool wq_has_sleeper(struct wait_queue_head *wq_head)
> {
> /*
> * We need to be sure we are in sync with the
> * add_wait_queue modifications to the wait queue.
> *
> * This memory barrier should be paired with one on the
> * waiting side.
> */
> smp_mb();
> return waitqueue_active(wq_head);
> }
>
> Which means this is in fact quite expensive.
>
> Since wakeup is a lock + an interrupt trip, it would still be
> cheaper single-threaded to "merely" suffer a full fence and for cases
> where the queue is empty often enough this is definitely the right thing
> to do.
We're comparing against no-op wakeup. A real wakeup does an IPI, which
completely dwarfs the cost of a barrier.
And note that wake_up() is spin_lock_irqsave(), not spin_lock(). I
assume it's gotten better, but back when I was looking at waitqueues
nested pushf/popf was horrifically expensive.
But perhaps can we do this with just a release barrier? Similar to how
list_empty_careful() works.
> On the other hand this executing when the queue is mostly *not* empty
> would combat the point.
>
> So unfortunately embedding this in wake_up is a no-go.
You definitely can't say that without knowing how often no-op
wake_up()s occur. It wouldn't be hard to gather that (write a patch to
add a pair of percpu counters, throw it on a few machines running random
workloads) and I think the results might surprise you.