Re: [PATCH v4 0/8] Remove dependency on congestion_wait in mm/

From: NeilBrown
Date: Tue Oct 26 2021 - 20:43:37 EST

Next message: Wanpeng Li: "Re: [PATCH] x86/kvm: Introduce boot parameter no-kvm-pvipi"
Previous message: kernel test robot: "[andersson:wip/sc8180x-next-20211015 3/61] drivers/usb/typec/ucsi/ucsi_glink.c:622:34: error: 'ucsi_glink_of_match' defined but not used"
Next in thread: Mel Gorman: "Re: [PATCH v4 0/8] Remove dependency on congestion_wait in mm/"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, 23 Oct 2021, Mel Gorman wrote:
> On Fri, Oct 22, 2021 at 10:26:30PM +1100, NeilBrown wrote:
> > On Fri, 22 Oct 2021, Mel Gorman wrote:
> > > On Fri, Oct 22, 2021 at 12:15:10PM +1100, NeilBrown wrote:
> > >
> > > > In general, I still don't like the use of wake_up_all(), though it won't
> > > > cause incorrect behaviour.
> > > >
> > >
> > > Removing wake_up_all would be tricky.
> >
> > I think there is a misunderstanding. Removing wake_up_all() is as
> > simple as
> > s/wake_up_all/wake_up/
> >
> > If you used prepare_to_wait_exclusive(), then wake_up() would only wake
> > one waiter, while wake_up_all() would wake all of them.
> > As you use prepare_to_wait(), wake_up() will wake all waiters - as will
> > wake_up_all().
> >
>
> Ok, yes, there was a misunderstanding. I thought you were suggesting a
> move to exclusive wakeups. I felt that the wake_up_all was explicit in
> terms of intent and that I really meant for all tasks to wake instead of
> one at a time.

Fair enough. Thanks for changing it :-)

But this prompts me to wonder if exclusive wakeups would be a good idea
- which is a useful springboard to try to understand the code better.

For VMSCAN_THROTTLE_ISOLATED they probably are.
One pattern for reliable exclusive wakeups is for any thread that
received a wake-up to then consider sending a wake up.

Two places receive VMSCAN_THROTTLE_ISOLATED wakeups and both then call
too_many_isolated() which - on success - sends another wakeup - before
the caller has had a chance to isolate anything. If, instead, the
wakeup was sent sometime later, after pages were isolated by before the
caller (isoloate_migratepages_block() or shrink_inactive_list())
returned, then we would get an orderly progression of threads running
through that code.

For VMSCAN_THROTTLE_WRITEBACK is a little less straight forward.
There are two different places that wait for the wakeup, and a wake_up
is sent to all waiters after a time proportional to the number of
waiters. It might make sense to wake one thread per time unit?
That might work well for do_writepages - every SWAP_CLUSTER_MAX writes
triggers one wakeup.
I'm less sure that it would work for shrink_node(). Maybe the
shrink_node() waiters could be non-exclusive so they get woken as soon a
SWAP_CLUSTER_MAX writes complete, while do_writepages are exclusive and
get woken one at a time.

For VMSCAN_THROTTLE_NOPROGRESS .... I don't understand.
If one zone isn't making "enough" progress, we throttle before moving on
to the next zone. So we delay processing of the next zone, and only
indirectly delay re-processing of the current congested zone.
Maybe it make sense, but I don't see it yet. I note that the commit
message says "it's messy". I can't argue with that!

I'll follow up with patches to clarify what I am thinking about the
first two. I'm not proposing the patches, just presenting them as part
of improving my understanding.

Thanks,
NeilBrown

Next message: Wanpeng Li: "Re: [PATCH] x86/kvm: Introduce boot parameter no-kvm-pvipi"
Previous message: kernel test robot: "[andersson:wip/sc8180x-next-20211015 3/61] drivers/usb/typec/ucsi/ucsi_glink.c:622:34: error: 'ucsi_glink_of_match' defined but not used"
Next in thread: Mel Gorman: "Re: [PATCH v4 0/8] Remove dependency on congestion_wait in mm/"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]