Re: [PATCH] mm: fix pcp count beyond pcp high in pcplist allocation

From: Chen Wandun
Date: Sun Oct 30 2022 - 23:37:55 EST




在 2022/10/25 21:19, Mel Gorman 写道:
On Tue, Oct 25, 2022 at 07:49:59PM +0800, Chen Wandun wrote:

On 2022/10/24 22:55, Mel Gorman wrote:
On Mon, Oct 24, 2022 at 09:41:46PM +0800, Chen Wandun wrote:
Nowadays there are several orders in pcplist, Function __rmqueue_pcplist
would alloc pcp batch pages to refill pcplist, when list of target order
if empty meanwhile other lists is not all empty, that result in pcp count
beyond pcp high after allocation. This behaviour can be easily observed by
adding debugging information in __rmqueue_pcplist.

Fix this by recalculate the batch pages to be allocated.
Are any problems observed other than the PCP lists temporarily exceed
pcp->high?
It will result frequently refill pcp page from buddy and release pcp page to
buddy.
Under what circumstances does this causes a problem? I 100% accept that it
Sorry for long time no reply.

It is hard to say this phenomenon would cause functional problem, I just found
this phenomenon and wonder if something can be improve.

could happen but one downside of the patch is that it simply changes the
shape of the problem. If the batch refill is clamped then potentially the
PCP list is depleted quicker and needs to be refilled sooner and so zone
lock acquisitions are still required potentially higher frequency due to
clamped refill sizes. All that changes is the timing.
Agree,  the contention of zone-lock need more consideration.

As is, the patch could result in a batch request of 0 and
 I foget this, the patch need some improve, thanks.

fall through to allocating from the zone list anyway defeating the
purpose of the PCP allocator and probably regressing performance in some
csaes.
Same as I understand???how about set high/batch for each order in pcplist???
Using anything would than (X >> order) consumes storage. Even if storage
was to be used, selecting a value per-order would be impossible because
the correct value would depend on frequency of requests for each order.
That can only be determined at runtime and the cost of determining the
value would likely exceed the benefit.
Can we set a experience value for pcp batch for each order during init stage?
If so we can make accurately control for pcp size. Nowdays, the size of each
order in pcp list is full of randomness. I dont konw which scheme is better
for performance.


At most, you could state that the batch refill should at least be 1 but
otherwise not exceed high. The downside is that zone->lock contention will
increase for a stream of THP pages which is a common allocation size.
The intent behind batch-2 was to reduce contention by 50% when multiple
processes are faulting large anonymous regions at the same time. THP
allocations are ones most likely to exceed pcp->high by a noticeable amount.

or just share pcp batch value only set high for each order? It looks like
strange for pcp count beyond pcp high in common case.

If each order has it's own pcp high value, that behaviour is same as pcplist
which
only contains order 0.

Specify in the changelog how a workload is improved. That may be in terms
of memory usage, performance, zone lock contention or cases where pcp->high
being exceeded causes a functional problem on a particular class of
system.
Got it, thanks.