Re: [PATCH net-next v1 12/12] mm: page_frag: update documentation and maintainer for page_frag

From: Alexander H Duyck
Date: Sun Apr 07 2024 - 14:14:08 EST


On Sun, 2024-04-07 at 21:08 +0800, Yunsheng Lin wrote:
> Update documentation about design, implementation and API usages
> for page_frag.
>
> Also update MAINTAINERS for page_frag. Alexander seems to be the
> orginal author for page_frag, we can add him to the MAINTAINERS
> later if we have an ack from him.
>
> CC: Alexander Duyck <alexander.duyck@xxxxxxxxx>
> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx>

Again, this seems more like 2 different pathches at least. One for the
Documentation and MAINTAINERS changes, and one for the function
documentation.

> ---
> Documentation/mm/page_frags.rst | 115 ++++++++++++++++++----------
> MAINTAINERS | 10 +++
> include/linux/page_frag_cache.h | 128 ++++++++++++++++++++++++++++++++
> mm/page_frag_cache.c | 51 ++++++++++---
> 4 files changed, 256 insertions(+), 48 deletions(-)
>
> diff --git a/Documentation/mm/page_frags.rst b/Documentation/mm/page_frags.rst
> index 503ca6cdb804..77256dfb58bf 100644
> --- a/Documentation/mm/page_frags.rst
> +++ b/Documentation/mm/page_frags.rst
> @@ -1,43 +1,80 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> ==============
> Page fragments
> ==============
>
> -A page fragment is an arbitrary-length arbitrary-offset area of memory
> -which resides within a 0 or higher order compound page. Multiple
> -fragments within that page are individually refcounted, in the page's
> -reference counter.
> -
> -The page_frag functions, page_frag_alloc and page_frag_free, provide a
> -simple allocation framework for page fragments. This is used by the
> -network stack and network device drivers to provide a backing region of
> -memory for use as either an sk_buff->head, or to be used in the "frags"
> -portion of skb_shared_info.
> -
> -In order to make use of the page fragment APIs a backing page fragment
> -cache is needed. This provides a central point for the fragment allocation
> -and tracks allows multiple calls to make use of a cached page. The
> -advantage to doing this is that multiple calls to get_page can be avoided
> -which can be expensive at allocation time. However due to the nature of
> -this caching it is required that any calls to the cache be protected by
> -either a per-cpu limitation, or a per-cpu limitation and forcing interrupts
> -to be disabled when executing the fragment allocation.
> -
> -The network stack uses two separate caches per CPU to handle fragment
> -allocation. The netdev_alloc_cache is used by callers making use of the
> -netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is
> -used by callers of the __napi_alloc_frag and napi_alloc_skb calls. The
> -main difference between these two calls is the context in which they may be
> -called. The "netdev" prefixed functions are usable in any context as these
> -functions will disable interrupts, while the "napi" prefixed functions are
> -only usable within the softirq context.
> -
> -Many network device drivers use a similar methodology for allocating page
> -fragments, but the page fragments are cached at the ring or descriptor
> -level. In order to enable these cases it is necessary to provide a generic
> -way of tearing down a page cache. For this reason __page_frag_cache_drain
> -was implemented. It allows for freeing multiple references from a single
> -page via a single call. The advantage to doing this is that it allows for
> -cleaning up the multiple references that were added to a page in order to
> -avoid calling get_page per allocation.
> -
> -Alexander Duyck, Nov 29, 2016.

What is the point of removing this just to add it to a C file further
down in the diff? Honestly I am not a fan of all the noise this is
adding to these diffs. Can we do a little less moving of lines for the
sake of moving them? All it does is pollute the git blame if you try to
figure out the origin of the lines.

> +.. kernel-doc:: mm/page_frag_cache.c
> + :doc: page_frag allocator
> +
> +Architecture overview
> +=====================
> +
> +.. code-block:: none
> +
> + +----------------------+
> + | page_frag API caller |
> + +----------------------+
> + ^
> + |
> + |
> + |
> + v
> + +----------------------------------------------+
> + | request page fragment |
> + +----------------------------------------------+
> + ^ ^
> + | |
> + | Cache empty or not enough |
> + | |
> + v |
> + +--------------------------------+ |
> + | refill cache with order 3 page | |
> + +--------------------------------+ |
> + ^ ^ |
> + | | |
> + | | Refill failed |
> + | | | Cache is enough
> + | | |
> + | v |
> + | +----------------------------------+ |
> + | | refill cache with order 0 page | |
> + | +----------------------------------+ |
> + | ^ |
> + | Refill succeed | |
> + | | Refill succeed |
> + | | |
> + v v v
> + +----------------------------------------------+
> + | allocate fragment from cache |
> + +----------------------------------------------+
> +

+1 for the simple visualization of how this works.

> +API interface
> +=============
> +As the design and implementation of page_frag API, the allocation side does not
> +allow concurrent calling, it is assumed that the caller must ensure there is not
> +concurrent alloc calling to the same page_frag_cache instance by using it's own
> +lock or rely on some lockless guarantee like NAPI softirq.
> +
> +Depending on different use cases, callers expecting to deal with va, page or
> +both va and page for them may call page_frag_alloc_va(), page_frag_alloc_pg(),
> +or page_frag_alloc() accordingly.
> +

So the new documentation is good up to here.

> +There is also a use case that need minimum memory in order for forward
> +progressing, but can do better if there is more memory available. Introduce
> +page_frag_alloc_prepare() and page_frag_alloc_commit() related API, the caller
> +requests the minimum memory it need and the prepare API will return the maximum
> +size of the fragment returned, caller need to report back to the page_frag core
> +how much memory it actually use by calling commit API, or not calling the commit
> +API if deciding to not use any memory.
> +

This part is as clear as mud to me. It sounds like kind of a convoluted
setup where you are having the caller have to know a fair bit about the
internal structure of the cache and it is essentially checking the
state and then performing a commit. Not a huge fan. I would almost
prefer to see something more like what we used to do with msix where
you just had a range you could request and if it can't give you at
least the minimum it fails.

I assume the patch is somewhere here in the set. Will take a look at it
later.

> +.. kernel-doc:: include/linux/page_frag_cache.h
> + :identifiers: page_frag_cache_init page_frag_cache_is_pfmemalloc
> + page_frag_alloc_va __page_frag_alloc_va_align
> + page_frag_alloc_va_align page_frag_alloc_va_prepare
> + page_frag_alloc_va_prepare_align page_frag_alloc_pg_prepare
> + page_frag_alloc_prepare page_frag_alloc_commit
> + page_frag_alloc_commit_noref page_frag_free_va
> +
> +.. kernel-doc:: mm/page_frag_cache.c
> + :identifiers: page_frag_cache_drain
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4745ea94d463..2f84aba59428 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16683,6 +16683,16 @@ F: mm/page-writeback.c
> F: mm/readahead.c
> F: mm/truncate.c
>
> +PAGE FRAG
> +M: Yunsheng Lin <linyunsheng@xxxxxxxxxx>
> +L: linux-mm@xxxxxxxxx
> +L: netdev@xxxxxxxxxxxxxxx
> +S: Supported
> +F: Documentation/mm/page_frags.rst
> +F: include/linux/page_frag_cache.h
> +F: mm/page_frag_cache.c
> +F: mm/page_frag_test.c
> +

I would appreciate it if you could add me as I usually am having to
deal with issues people have with this anyway. You can probably just go
with:
Alexander Duyck <alexander.duyck@xxxxxxxxx>

> PAGE POOL
> M: Jesper Dangaard Brouer <hawk@xxxxxxxxxx>
> M: Ilias Apalodimas <ilias.apalodimas@xxxxxxxxxx>
> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h
> index 28185969cd2c..d8edbecdd179 100644
> --- a/include/linux/page_frag_cache.h
> +++ b/include/linux/page_frag_cache.h
> @@ -31,11 +31,23 @@ struct page_frag_cache {
> #endif
> };
>
> +/**
> + * page_frag_cache_init() - Init page_frag cache.
> + * @nc: page_frag cache from which to init
> + *
> + * Inline helper to init the page_frag cache.
> + */
> static inline void page_frag_cache_init(struct page_frag_cache *nc)
> {
> nc->va = NULL;
> }
>
> +/**
> + * page_frag_cache_is_pfmemalloc() - Check for pfmemalloc.
> + * @nc: page_frag cache from which to check
> + *
> + * Used to check if the current page in page_frag cache is pfmemalloc'ed.
> + */
> static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc)
> {
> return !!nc->pfmemalloc;
> @@ -46,6 +58,17 @@ void __page_frag_cache_drain(struct page *page, unsigned int count);
> void *page_frag_cache_refill(struct page_frag_cache *nc, unsigned int fragsz,
> gfp_t gfp_mask);
>
> +/**
> + * page_frag_alloc_va() - Alloc a page fragment.
> + * @nc: page_frag cache from which to allocate
> + * @fragsz: the requested fragment size
> + * @gfp_mask: the allocation gfp to use when cache need to be refilled
> + *
> + * Get a page fragment from page_frag cache.
> + *
> + * Return:
> + * Return va of the page fragment, otherwise return NULL.
> + */
> static inline void *page_frag_alloc_va(struct page_frag_cache *nc,
> unsigned int fragsz, gfp_t gfp_mask)
> {
> @@ -63,6 +86,19 @@ static inline void *page_frag_alloc_va(struct page_frag_cache *nc,
> return va + offset;
> }
>
> +/**
> + * __page_frag_alloc_va_align() - Alloc a page fragment with aligning
> + * requirement.
> + * @nc: page_frag cache from which to allocate
> + * @fragsz: the requested fragment size
> + * @gfp_mask: the allocation gfp to use when cache need to be refilled
> + * @align: the requested aligning requirement
> + *
> + * Get a page fragment from page_frag cache with aligning requirement.
> + *
> + * Return:
> + * Return va of the page fragment, otherwise return NULL.
> + */
> static inline void *__page_frag_alloc_va_align(struct page_frag_cache *nc,
> unsigned int fragsz,
> gfp_t gfp_mask,
> @@ -75,6 +111,19 @@ static inline void *__page_frag_alloc_va_align(struct page_frag_cache *nc,
> return page_frag_alloc_va(nc, fragsz, gfp_mask);
> }
>
> +/**
> + * page_frag_alloc_va_align() - Alloc a page fragment with aligning requirement.
> + * @nc: page_frag cache from which to allocate
> + * @fragsz: the requested fragment size
> + * @gfp_mask: the allocation gfp to use when cache need to be refilled
> + * @align: the requested aligning requirement
> + *
> + * WARN_ON_ONCE() checking for align and fragsz before getting a page fragment
> + * from page_frag cache with aligning requirement.
> + *
> + * Return:
> + * Return va of the page fragment, otherwise return NULL.
> + */
> static inline void *page_frag_alloc_va_align(struct page_frag_cache *nc,
> unsigned int fragsz,
> gfp_t gfp_mask,
> @@ -86,6 +135,19 @@ static inline void *page_frag_alloc_va_align(struct page_frag_cache *nc,
> return __page_frag_alloc_va_align(nc, fragsz, gfp_mask, align);
> }
>
> +/**
> + * page_frag_alloc_va_prepare() - Prepare allocing a page fragment.
> + * @nc: page_frag cache from which to prepare
> + * @offset: out as the offset of the page fragment
> + * @size: in as the requested size, out as the available size
> + * @gfp_mask: the allocation gfp to use when cache need to be refilled
> + *
> + * Prepare a page fragment with minimum size of ‘size’, 'size' is also used to
> + * report the maximum size of the page fragment the caller can use.
> + *
> + * Return:
> + * Return va of the page fragment, otherwise return NULL.
> + */
> static inline void *page_frag_alloc_va_prepare(struct page_frag_cache *nc,
> unsigned int *offset,
> unsigned int *size,
> @@ -108,6 +170,21 @@ static inline void *page_frag_alloc_va_prepare(struct page_frag_cache *nc,
> return va + *offset;
> }
>
> +/**
> + * page_frag_alloc_va_prepare_align() - Prepare allocing a page fragment with
> + * aligning requirement.
> + * @nc: page_frag cache from which to prepare
> + * @offset: out as the offset of the page fragment
> + * @size: in as the requested size, out as the available size
> + * @align: the requested aligning requirement
> + * @gfp_mask: the allocation gfp to use when cache need to be refilled
> + *
> + * Prepare an aligned page fragment with minimum size of ‘size’, 'size' is also
> + * used to report the maximum size of the page fragment the caller can use.
> + *
> + * Return:
> + * Return va of the page fragment, otherwise return NULL.
> + */
> static inline void *page_frag_alloc_va_prepare_align(struct page_frag_cache *nc,
> unsigned int *offset,
> unsigned int *size,
> @@ -144,6 +221,19 @@ static inline void *__page_frag_alloc_pg_prepare(struct page_frag_cache *nc,
> return va;
> }
>
> +/**
> + * page_frag_alloc_pg_prepare - Prepare allocing a page fragment.
> + * @nc: page_frag cache from which to prepare
> + * @offset: out as the offset of the page fragment
> + * @size: in as the requested size, out as the available size
> + * @gfp: the allocation gfp to use when cache need to be refilled
> + *
> + * Prepare a page fragment with minimum size of ‘size’, 'size' is also used to
> + * report the maximum size of the page fragment the caller can use.
> + *
> + * Return:
> + * Return the page fragment, otherwise return NULL.
> + */
> #define page_frag_alloc_pg_prepare(nc, offset, size, gfp) \
> ({ \
> struct page *__page = NULL; \
> @@ -179,6 +269,21 @@ static inline void *__page_frag_alloc_prepare(struct page_frag_cache *nc,
> return nc_va;
> }
>
> +/**
> + * page_frag_alloc_prepare - Prepare allocing a page fragment.
> + * @nc: page_frag cache from which to prepare
> + * @offset: out as the offset of the page fragment
> + * @size: in as the requested size, out as the available size
> + * @va: out as the va of the returned page fragment
> + * @gfp: the allocation gfp to use when cache need to be refilled
> + *
> + * Prepare a page fragment with minimum size of ‘size’, 'size' is also used to
> + * report the maximum size of the page fragment. Return both 'page' and 'va' of
> + * the fragment to the caller.
> + *
> + * Return:
> + * Return the page fragment, otherwise return NULL.
> + */
> #define page_frag_alloc_prepare(nc, offset, size, va, gfp) \
> ({ \
> struct page *__page = NULL; \
> @@ -191,6 +296,14 @@ static inline void *__page_frag_alloc_prepare(struct page_frag_cache *nc,
> __page; \
> })
>
> +/**
> + * page_frag_alloc_commit - Commit allocing a page fragment.
> + * @nc: page_frag cache from which to commit
> + * @offset: offset of the page fragment
> + * @size: size of the page fragment has been used
> + *
> + * Commit the alloc preparing by passing offset and the actual used size.
> + */
> static inline void page_frag_alloc_commit(struct page_frag_cache *nc,
> unsigned int offset,
> unsigned int size)
> @@ -199,6 +312,17 @@ static inline void page_frag_alloc_commit(struct page_frag_cache *nc,
> nc->offset = offset + size;
> }
>
> +/**
> + * page_frag_alloc_commit_noref - Commit allocing a page fragment without taking
> + * page refcount.
> + * @nc: page_frag cache from which to commit
> + * @offset: offset of the page fragment
> + * @size: size of the page fragment has been used
> + *
> + * Commit the alloc preparing by passing offset and the actual used size, but
> + * not taking page refcount. Mostly used for fragmemt coaleasing case when the
> + * current fragmemt can share the same refcount with previous fragmemt.
> + */
> static inline void page_frag_alloc_commit_noref(struct page_frag_cache *nc,
> unsigned int offset,
> unsigned int size)
> @@ -206,6 +330,10 @@ static inline void page_frag_alloc_commit_noref(struct page_frag_cache *nc,
> nc->offset = offset + size;
> }
>
> +/**
> + * page_frag_free_va - Free a page fragment by va.
> + * @addr: va of page fragment to be freed
> + */
> void page_frag_free_va(void *addr);
>
> #endif
> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
> index cbd0ed82a596..0c76ec006c22 100644
> --- a/mm/page_frag_cache.c
> +++ b/mm/page_frag_cache.c
> @@ -1,15 +1,44 @@
> // SPDX-License-Identifier: GPL-2.0-only
> -/* Page fragment allocator
> +
> +/**
> + * DOC: page_frag allocator
> + *
> + * A page fragment is an arbitrary-length arbitrary-offset area of memory which
> + * resides within a 0 or higher order compound page. Multiple fragments within
> + * that page are individually refcounted, in the page's reference counter.
> + *
> + * The page_frag functions, page_frag_alloc* and page_frag_free*, provide a
> + * simple allocation framework for page fragments. This is used by the network
> + * stack and network device drivers to provide a backing region of memory for
> + * use as either an sk_buff->head, or to be used in the "frags" portion of
> + * skb_shared_info.
> *
> - * Page Fragment:
> - * An arbitrary-length arbitrary-offset area of memory which resides within a
> - * 0 or higher order page. Multiple fragments within that page are
> - * individually refcounted, in the page's reference counter.
> + * In order to make use of the page fragment APIs a backing page fragment cache
> + * is needed. This provides a central point for the fragment allocation and
> + * tracks allows multiple calls to make use of a cached page. The advantage to
> + * doing this is that multiple calls to get_page can be avoided which can be
> + * expensive at allocation time. However due to the nature of this caching it
> + * is required that any calls to the cache be protected by either a per-cpu
> + * limitation, or a per-cpu limitation and forcing interrupts to be disabled
> + * when executing the fragment allocation.
> *
> - * The page_frag functions provide a simple allocation framework for page
> - * fragments. This is used by the network stack and network device drivers to
> - * provide a backing region of memory for use as either an sk_buff->head, or to
> - * be used in the "frags" portion of skb_shared_info.
> + * The network stack uses two separate caches per CPU to handle fragment
> + * allocation. The netdev_alloc_cache is used by callers making use of the
> + * netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is
> + * used by callers of the __napi_alloc_frag and napi_alloc_skb calls. The
> + * main difference between these two calls is the context in which they may be
> + * called. The "netdev" prefixed functions are usable in any context as these
> + * functions will disable interrupts, while the "napi" prefixed functions are
> + * only usable within the softirq context.
> + *
> + * Many network device drivers use a similar methodology for allocating page
> + * fragments, but the page fragments are cached at the ring or descriptor
> + * level. In order to enable these cases it is necessary to provide a generic
> + * way of tearing down a page cache. For this reason __page_frag_cache_drain
> + * was implemented. It allows for freeing multiple references from a single
> + * page via a single call. The advantage to doing this is that it allows for
> + * cleaning up the multiple references that were added to a page in order to
> + * avoid calling get_page per allocation.
> */
>

Again, not a huge fan of moving this. It would be better to just leave
it where it was and add your documentation onto it.

> #include <linux/export.h>
> @@ -57,6 +86,10 @@ static bool __page_frag_cache_refill(struct page_frag_cache *nc,
> return true;
> }
>
> +/**
> + * page_frag_cache_drain - Drain the current page from page_frag cache.
> + * @nc: page_frag cache from which to drain
> + */
> void page_frag_cache_drain(struct page_frag_cache *nc)
> {
> if (!nc->va)