Re: [RFC PATCH] mm: page_alloc: High-order per-cpu page allocator

From: Vlastimil Babka
Date: Wed Nov 23 2016 - 10:37:27 EST


On 11/21/2016 04:55 PM, Mel Gorman wrote:

...

hackbench was also tested with both socket and pipes and both processes
and threads and the results are interesting in terms of how variability
is imapcted

1-socket machine -- pipes and processes
4.9.0-rc5 4.9.0-rc5
vanilla highmark-v1r12
Amean 1 12.9637 ( 0.00%) 12.9570 ( 0.05%)
Amean 3 13.4770 ( 0.00%) 13.4447 ( 0.24%)
Amean 5 18.5333 ( 0.00%) 19.0917 ( -3.01%)
Amean 7 24.5690 ( 0.00%) 26.1010 ( -6.24%)
Amean 12 39.7990 ( 0.00%) 40.6763 ( -2.20%)
Amean 16 56.0520 ( 0.00%) 58.2530 ( -3.93%)

Here, higher values are better or worse?

Stddev 1 0.3847 ( 0.00%) 0.3137 ( 18.45%)
Stddev 3 0.2652 ( 0.00%) 0.3697 (-39.41%)
Stddev 5 0.5589 ( 0.00%) 0.9438 (-68.88%)
Stddev 7 0.5310 ( 0.00%) 0.2699 ( 49.18%)
Stddev 12 1.0780 ( 0.00%) 0.3421 ( 68.26%)
Stddev 16 2.1138 ( 0.00%) 1.5677 ( 25.84%)

It's not a universal win but the differences are within the noise. What
is interesting is that for high thread counts that variability is much
reduced -- the time when contention would be expected to be high. This
is not consistent across all machines but it mostly applies.

While pipes, sockets and threads were tested, they did not show anything
else interesting.

fsmark was tested with zero-sized files to continually allocate slab objects
but didn't show any differences. This can be explained by the fact that the
workload is only allocating and does not have mix of allocs/frees that would
benefit from the caching. It was tested to ensure no major harm was done.

Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
---
include/linux/mmzone.h | 20 ++++++++-
mm/page_alloc.c | 120 +++++++++++++++++++++++++++++--------------------
2 files changed, 90 insertions(+), 50 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0f088f3a2fed..02eb24d90d70 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -255,6 +255,24 @@ enum zone_watermarks {
NR_WMARK
};

+/*
+ * One per migratetype for order-0 pages and one per high-order up to
+ * and including PAGE_ALLOC_COSTLY_ORDER. This may allow unmovable
+ * allocations to contaminate reclaimable pageblocks if high-order
+ * pages are heavily used.
+ */
+#define NR_PCP_LISTS (MIGRATE_PCPTYPES + PAGE_ALLOC_COSTLY_ORDER + 1)

Should it be "- 1" instead of "+ 1"?

+
+static inline unsigned int pindex_to_order(unsigned int pindex)
+{
+ return pindex < MIGRATE_PCPTYPES ? 0 : pindex - MIGRATE_PCPTYPES + 1;
+}
+
+static inline unsigned int order_to_pindex(int migratetype, unsigned int order)
+{
+ return (order == 0) ? migratetype : MIGRATE_PCPTYPES - 1 + order;

HereI think that "MIGRATE_PCPTYPES + order - 1" would be easier to understand as the array is for all migratetypes, but the order is shifted?

@@ -1083,10 +1083,12 @@ static bool bulkfree_pcp_prepare(struct page *page)
* pinned" detection logic.
*/
static void free_pcppages_bulk(struct zone *zone, int count,
- struct per_cpu_pages *pcp)
+ struct per_cpu_pages *pcp,
+ int migratetype)
{
- int migratetype = 0;
- int batch_free = 0;
+ unsigned int pindex = 0;

Should pindex be initialized to migratetype to match the list below?

+ struct list_head *list = &pcp->lists[migratetype];
+ unsigned int nr_freed = 0;
unsigned long nr_scanned;
bool isolated_pageblocks;