Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list

From: Vlastimil Babka
Date: Thu Oct 13 2016 - 05:23:54 EST


On 10/13/2016 10:08 AM, js1304@xxxxxxxxx wrote:
From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>

Currently, freeing page can stay longer in the buddy list if next higher
order page is in the buddy list in order to help coalescence. However,
it doesn't work for the simplest sequential free case. For example, think
about the situation that 8 consecutive pages are freed in sequential
order.

page 0: attached at the head of order 0 list
page 1: merged with page 0, attached at the head of order 1 list
page 2: attached at the tail of order 0 list
page 3: merged with page 2 and then merged with page 0, attached at
the head of order 2 list
page 4: attached at the head of order 0 list
page 5: merged with page 4, attached at the tail of order 1 list
page 6: attached at the tail of order 0 list
page 7: merged with page 6 and then merged with page 4. Lastly, merged
with page 0 and we get order 3 freepage.

With excluding page 0 case, there are three cases that freeing page is
attached at the head of buddy list in this example and if just one
corresponding ordered allocation request comes at that moment, this page
in being a high order page will be allocated and we would fail to make
order-3 freepage.

Allocation usually happens in sequential order and free also does. So, it

Are you sure this is true except after the system is freshly booted? As soon as it becomes fragmented, a stream of order-0 allocations will likely grab them randomly from all over the place and it's unlikely to recover except small orders.

would be important to detect such a situation and to give some chance
to be coalesced.

I think that simple and effective heuristic about this case is just
attaching freeing page at the tail of the buddy list unconditionally.
If freeing isn't merged during one rotation, it would be actual
fragmentation and we don't need to care about it for coalescence.

I'm not against removing this heuristic, but not without some benchmarks. The disadvantage of putting pages to tail lists is that they become cache-cold until allocated again. We should check how large that problem is.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
---
mm/page_alloc.c | 25 ++-----------------------
1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1790391..c4f7d05 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -858,29 +858,8 @@ static inline void __free_one_page(struct page *page,
done_merging:
set_page_order(page, order);

- /*
- * If this is not the largest possible page, check if the buddy
- * of the next-highest order is free. If it is, it's possible
- * that pages are being freed that will coalesce soon. In case,
- * that is happening, add the free page to the tail of the list
- * so it's less likely to be used soon and more likely to be merged
- * as a higher order page
- */
- if ((order < MAX_ORDER-2) && pfn_valid_within(page_to_pfn(buddy))) {
- struct page *higher_page, *higher_buddy;
- combined_idx = buddy_idx & page_idx;
- higher_page = page + (combined_idx - page_idx);
- buddy_idx = __find_buddy_index(combined_idx, order + 1);
- higher_buddy = higher_page + (buddy_idx - combined_idx);
- if (page_is_buddy(higher_page, higher_buddy, order + 1)) {
- list_add_tail(&page->lru,
- &zone->free_area[order].free_list[migratetype]);
- goto out;
- }
- }
-
- list_add(&page->lru, &zone->free_area[order].free_list[migratetype]);
-out:
+ list_add_tail(&page->lru,
+ &zone->free_area[order].free_list[migratetype]);
zone->free_area[order].nr_free++;
}