Re: [PATCH net-next v2 00/10] Replace page_frag with page_frag_cache (Part-2)

From: Yunsheng Lin
Date: Fri Dec 13 2024 - 07:10:00 EST


On 2024/12/11 20:52, Yunsheng Lin wrote:
> It seems that bottleneck is still the freeing side that the above
> result might not be as meaningful as it should be.

Through 'perf top' annotating, there seems to be about 70%+ cpu usage
for the atmoic operation of put_page_testzero() in page_frag_free(),
it was unexpected that the atmoic operation had that much overhead:(

>
> As we can't use more than one cpu for the free side without some
> lock using a single ptr_ring, it seems something more complicated
> might need to be done in order to support more than one CPU for the
> freeing side?
>
> Before patch 1, __page_frag_alloc_align took up to 3.62% percent of
> CPU using 'perf top'.
> After patch 1, __page_frag_cache_prepare() and __page_frag_cache_commit_noref()
> took up to 4.67% + 1.01% = 5.68%.
> Having a similar result, I am not sure if the CPU usages is able tell us
> the performance degradation here as it seems to be quite large?
>

And using 'struct page_frag' to pass the parameter seems to cause some
observable overhead as the testing is very low level, peformance seems to
be negligible using the below patch to avoid passing 'struct page_frag',
3.62% and 3.27% for the cpu usages for __page_frag_alloc_align() before
patch 1 and __page_frag_cache_prepare() after patch 1 respectively.

The new refatcoring avoid some overhead for the old API, but might cause
some overhead for the new API as it is not able to skip the virt_to_page()
for refilling and reusing case, though it seems to be an unlikely case.
Or any better idea how to do refatcoring for unifying the page_frag API?

diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h
index 41a91df82631..b83e7655654e 100644
--- a/include/linux/page_frag_cache.h
+++ b/include/linux/page_frag_cache.h
@@ -39,8 +39,24 @@ static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc)

void page_frag_cache_drain(struct page_frag_cache *nc);
void __page_frag_cache_drain(struct page *page, unsigned int count);
-void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz,
- gfp_t gfp_mask, unsigned int align_mask);
+void *__page_frag_cache_prepare(struct page_frag_cache *nc, unsigned int fragsz,
+ gfp_t gfp_mask, unsigned int align_mask);
+
+static inline void *__page_frag_alloc_align(struct page_frag_cache *nc,
+ unsigned int fragsz, gfp_t gfp_mask,
+ unsigned int align_mask)
+{
+ void *va;
+
+ va = __page_frag_cache_prepare(nc, fragsz, gfp_mask, align_mask);
+ if (likely(va)) {
+ va += nc->offset;
+ nc->offset += fragsz;
+ nc->pagecnt_bias--;
+ }
+
+ return va;
+}

static inline void *page_frag_alloc_align(struct page_frag_cache *nc,
unsigned int fragsz, gfp_t gfp_mask,
diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
index 3f7a203d35c6..729309aee27a 100644
--- a/mm/page_frag_cache.c
+++ b/mm/page_frag_cache.c
@@ -90,9 +90,9 @@ void __page_frag_cache_drain(struct page *page, unsigned int count)
}
EXPORT_SYMBOL(__page_frag_cache_drain);

-void *__page_frag_alloc_align(struct page_frag_cache *nc,
- unsigned int fragsz, gfp_t gfp_mask,
- unsigned int align_mask)
+void *__page_frag_cache_prepare(struct page_frag_cache *nc,
+ unsigned int fragsz, gfp_t gfp_mask,
+ unsigned int align_mask)
{
unsigned long encoded_page = nc->encoded_page;
unsigned int size, offset;
@@ -151,12 +151,10 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
offset = 0;
}

- nc->pagecnt_bias--;
- nc->offset = offset + fragsz;
-
- return encoded_page_decode_virt(encoded_page) + offset;
+ nc->offset = offset;
+ return encoded_page_decode_virt(encoded_page);
}
-EXPORT_SYMBOL(__page_frag_alloc_align);
+EXPORT_SYMBOL(__page_frag_cache_prepare);

/*
* Frees a page fragment allocated out of either a compound or order 0 page.