Re: [PATCH 2/2] mm: page_alloc: High-order per-cpu page allocator v5
From: Michal Hocko
Date: Fri Dec 02 2016 - 03:21:15 EST
On Fri 02-12-16 15:03:46, Joonsoo Kim wrote:
[...]
> > o pcp accounting during free is now confined to free_pcppages_bulk as it's
> > impossible for the caller to know exactly how many pages were freed.
> > Due to the high-order caches, the number of pages drained for a request
> > is no longer precise.
> >
> > o The high watermark for per-cpu pages is increased to reduce the probability
> > that a single refill causes a drain on the next free.
[...]
> I guess that this patch would cause following problems.
>
> 1. If pcp->batch is too small, high order page will not be freed
> easily and survive longer. Think about following situation.
>
> Batch count: 7
> MIGRATE_UNMOVABLE -> MIGRATE_MOVABLE -> MIGRATE_RECLAIMABLE -> order 1
> -> order 2...
>
> free count: 1 + 1 + 1 + 2 + 4 = 9
> so order 3 would not be freed.
I guess the second paragraph above in the changelog tries to clarify
that...
> 2. And, It seems that this logic penalties high order pages. One free
> to high order page means 1 << order pages free rather than just
> one page free. This logic do round-robin to choose the target page so
> amount of freed page will be different by the order.
Yes this is indeed possible. The first paragraph above mentions this
problem.
> I think that it
> makes some sense because high order page are less important to cache
> in pcp than lower order but I'd like to know if it is intended or not.
> If intended, it deserves the comment.
>
> 3. I guess that order-0 file/anon page alloc/free is dominent in many
> workloads. If this case happen, it invalidates effect of high order
> cache in pcp since cached high order pages would be also freed to the
> buddy when burst order-0 free happens.
Yes this is true and I was wondering the same but I believe this can be
enahanced later on. E.g. we can check the order when crossing pcp->high
mark and only the given order portion of the batch. I just wouldn't over
optimize at this stage.
--
Michal Hocko
SUSE Labs