Re: [PATCH 1/2] mm: clarify __GFP_MEMALLOC usage

From: Michal Hocko
Date: Mon Apr 06 2020 - 03:01:51 EST


On Sat 04-04-20 08:23:45, Neil Brown wrote:
> On Fri, Apr 03 2020, David Rientjes wrote:
[...]
> > Hmm, any guidance that we can offer to users of this flag that aren't
> > aware of __GFP_MEMALLOC internals? If I were to read this and not be
> > aware of the implementation, I would ask "how do I know when I'm at risk
> > of depleting this reserve" especially since the amount of reserve is
> > controlled by sysctl. How do I know when I'm risking a depletion of this
> > shared reserve?
>
> "how do I know when I'm at risk of depleting this reserve" is definitely
> the wrong question to be asking. The questions to ask are:
> - how little memory to I need to ensure forward progress?
> - how quick will that forward progress be?

Absolutely agreed. The total amount of reserves will always depend on
all other users. Unless they are perfectly coordinated, which is not the
case.

> In the ideal case a small allocation will be all that is needed in order
> for that allocation plus another page to be freed "quickly", in time
> governed only by throughput to some device. In that case you probably
> don't need to worry about rate limiting.

Right but I wouldn't expect this to be a general usage pattern of this
flag. "Allocate to free memory" suggests this would be a part of the
memory reclaim process and that really needs some form of rate
limiting. Be it the reclaim itself directly or some other mechanism if
this happens from a different context.

> The reason I brought up ratelimiting is that RCU is slow. You can get
> quite a lot of memory caught up in the kfree-rcu lists. That's not much
> of a problem for normal memory, but it might be for the more limited
> reserves.

Right.

> The other difficulty with the the kfree_rcu case is that we have no idea
> how many users there will be, so we cannot realistically model how long
> the queue might get. Compare with NFS swap-out there the only user it
> the VM swapping memory which (I think?) already tries to pace writeout
> with the speed of the device (or is that just writeback...). I'm
> clearly not sure of the details but it is a more constrained environment
> so it is more predicatable.

Mel explained this http://lkml.kernel.org/r/20200401131426.GN3772@xxxxxxx

> In many cases, preallocating a private reserve is better than using
> GFP_MEMALLOC. That is what mempools provide and they are very effective
> (though often way over-allocated*).
> GFP_MEMALLOC was added because swap-over-NFS requires lots of different
> allocations (transmit headers, receive buffers, possible routing changes
> etc), many of them in the network layer which is very sensitive
> to latency (and mempools require a spinlock to get the reserves).

Yes.

> Maybe the documentation should say.
> Don't use this - use a mempool. Here be dragons.

OK, this looks like a good idea.

> I'm not sure you can really say anything more useful without writing a
> long essay.

Yes and I am not sure it would be really more helpful than confusing.
What do you think about this updated patch?