Re: [PATCH net-next v1 12/12] mm: page_frag: update documentation and maintainer for page_frag
From: Yunsheng Lin
Date: Mon Apr 08 2024 - 09:45:27 EST
On 2024/4/8 2:13, Alexander H Duyck wrote:
> On Sun, 2024-04-07 at 21:08 +0800, Yunsheng Lin wrote:
>> Update documentation about design, implementation and API usages
>> for page_frag.
>>
>> Also update MAINTAINERS for page_frag. Alexander seems to be the
>> orginal author for page_frag, we can add him to the MAINTAINERS
>> later if we have an ack from him.
>>
>> CC: Alexander Duyck <alexander.duyck@xxxxxxxxx>
>> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx>
>
> Again, this seems more like 2 different pathches at least. One for the
> Documentation and MAINTAINERS changes, and one for the function
> documentation.
Sure.
>
>> ---
>> Documentation/mm/page_frags.rst | 115 ++++++++++++++++++----------
>> MAINTAINERS | 10 +++
>> include/linux/page_frag_cache.h | 128 ++++++++++++++++++++++++++++++++
>> mm/page_frag_cache.c | 51 ++++++++++---
>> 4 files changed, 256 insertions(+), 48 deletions(-)
>>
>> diff --git a/Documentation/mm/page_frags.rst b/Documentation/mm/page_frags.rst
>> index 503ca6cdb804..77256dfb58bf 100644
>> --- a/Documentation/mm/page_frags.rst
>> +++ b/Documentation/mm/page_frags.rst
>> @@ -1,43 +1,80 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> ==============
>> Page fragments
>> ==============
>>
>> -A page fragment is an arbitrary-length arbitrary-offset area of memory
>> -which resides within a 0 or higher order compound page. Multiple
>> -fragments within that page are individually refcounted, in the page's
>> -reference counter.
>> -
>> -The page_frag functions, page_frag_alloc and page_frag_free, provide a
>> -simple allocation framework for page fragments. This is used by the
>> -network stack and network device drivers to provide a backing region of
>> -memory for use as either an sk_buff->head, or to be used in the "frags"
>> -portion of skb_shared_info.
>> -
>> -In order to make use of the page fragment APIs a backing page fragment
>> -cache is needed. This provides a central point for the fragment allocation
>> -and tracks allows multiple calls to make use of a cached page. The
>> -advantage to doing this is that multiple calls to get_page can be avoided
>> -which can be expensive at allocation time. However due to the nature of
>> -this caching it is required that any calls to the cache be protected by
>> -either a per-cpu limitation, or a per-cpu limitation and forcing interrupts
>> -to be disabled when executing the fragment allocation.
>> -
>> -The network stack uses two separate caches per CPU to handle fragment
>> -allocation. The netdev_alloc_cache is used by callers making use of the
>> -netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is
>> -used by callers of the __napi_alloc_frag and napi_alloc_skb calls. The
>> -main difference between these two calls is the context in which they may be
>> -called. The "netdev" prefixed functions are usable in any context as these
>> -functions will disable interrupts, while the "napi" prefixed functions are
>> -only usable within the softirq context.
>> -
>> -Many network device drivers use a similar methodology for allocating page
>> -fragments, but the page fragments are cached at the ring or descriptor
>> -level. In order to enable these cases it is necessary to provide a generic
>> -way of tearing down a page cache. For this reason __page_frag_cache_drain
>> -was implemented. It allows for freeing multiple references from a single
>> -page via a single call. The advantage to doing this is that it allows for
>> -cleaning up the multiple references that were added to a page in order to
>> -avoid calling get_page per allocation.
>> -
>> -Alexander Duyck, Nov 29, 2016.
>
> What is the point of removing this just to add it to a C file further
> down in the diff? Honestly I am not a fan of all the noise this is
> adding to these diffs. Can we do a little less moving of lines for the
> sake of moving them? All it does is pollute the git blame if you try to
> figure out the origin of the lines.
I was thinking about move the doc related code to file where code is related,
so that author will remember to update the doc when changing the code.
Maybe above does not matter that much?
>
>> +.. kernel-doc:: mm/page_frag_cache.c
>> + :doc: page_frag allocator
>> +
>> +Architecture overview
>> +=====================
>> +
>> +.. code-block:: none
>> +
>> + +----------------------+
>> + | page_frag API caller |
>> + +----------------------+
>> + ^
>> + |
>> + |
>> + |
>> + v
>> + +----------------------------------------------+
>> + | request page fragment |
>> + +----------------------------------------------+
>> + ^ ^
>> + | |
>> + | Cache empty or not enough |
>> + | |
>> + v |
>> + +--------------------------------+ |
>> + | refill cache with order 3 page | |
>> + +--------------------------------+ |
>> + ^ ^ |
>> + | | |
>> + | | Refill failed |
>> + | | | Cache is enough
>> + | | |
>> + | v |
>> + | +----------------------------------+ |
>> + | | refill cache with order 0 page | |
>> + | +----------------------------------+ |
>> + | ^ |
>> + | Refill succeed | |
>> + | | Refill succeed |
>> + | | |
>> + v v v
>> + +----------------------------------------------+
>> + | allocate fragment from cache |
>> + +----------------------------------------------+
>> +
>
> +1 for the simple visualization of how this works.
>
>> +API interface
>> +=============
>> +As the design and implementation of page_frag API, the allocation side does not
>> +allow concurrent calling, it is assumed that the caller must ensure there is not
>> +concurrent alloc calling to the same page_frag_cache instance by using it's own
>> +lock or rely on some lockless guarantee like NAPI softirq.
>> +
>> +Depending on different use cases, callers expecting to deal with va, page or
>> +both va and page for them may call page_frag_alloc_va(), page_frag_alloc_pg(),
>> +or page_frag_alloc() accordingly.
>> +
>
> So the new documentation is good up to here.
>
>> +There is also a use case that need minimum memory in order for forward
>> +progressing, but can do better if there is more memory available. Introduce
>> +page_frag_alloc_prepare() and page_frag_alloc_commit() related API, the caller
>> +requests the minimum memory it need and the prepare API will return the maximum
>> +size of the fragment returned, caller need to report back to the page_frag core
>> +how much memory it actually use by calling commit API, or not calling the commit
>> +API if deciding to not use any memory.
>> +
>
> This part is as clear as mud to me. It sounds like kind of a convoluted
> setup where you are having the caller have to know a fair bit about the
> internal structure of the cache and it is essentially checking the
> state and then performing a commit. Not a huge fan. I would almost
> prefer to see something more like what we used to do with msix where
> you just had a range you could request and if it can't give you at
> least the minimum it fails.>
> I assume the patch is somewhere here in the set. Will take a look at it
> later.
Yes, the API is introduced in patch 9 and used in patch 10.
>
>> +.. kernel-doc:: include/linux/page_frag_cache.h
>> + :identifiers: page_frag_cache_init page_frag_cache_is_pfmemalloc
>> + page_frag_alloc_va __page_frag_alloc_va_align
>> + page_frag_alloc_va_align page_frag_alloc_va_prepare
>> + page_frag_alloc_va_prepare_align page_frag_alloc_pg_prepare
>> + page_frag_alloc_prepare page_frag_alloc_commit
>> + page_frag_alloc_commit_noref page_frag_free_va
>> +
>> +.. kernel-doc:: mm/page_frag_cache.c
>> + :identifiers: page_frag_cache_drain
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 4745ea94d463..2f84aba59428 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -16683,6 +16683,16 @@ F: mm/page-writeback.c
>> F: mm/readahead.c
>> F: mm/truncate.c
>>
>> +PAGE FRAG
>> +M: Yunsheng Lin <linyunsheng@xxxxxxxxxx>
>> +L: linux-mm@xxxxxxxxx
>> +L: netdev@xxxxxxxxxxxxxxx
>> +S: Supported
>> +F: Documentation/mm/page_frags.rst
>> +F: include/linux/page_frag_cache.h
>> +F: mm/page_frag_cache.c
>> +F: mm/page_frag_test.c
>> +
>
> I would appreciate it if you could add me as I usually am having to
> deal with issues people have with this anyway. You can probably just go
> with:
> Alexander Duyck <alexander.duyck@xxxxxxxxx>
Sure, good to your ack here.
>
>> PAGE POOL
>> M: Jesper Dangaard Brouer <hawk@xxxxxxxxxx>
>> M: Ilias Apalodimas <ilias.apalodimas@xxxxxxxxxx>
>> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h
>> index 28185969cd2c..d8edbecdd179 100644