Re: [PATCH net-next v2 3/3] net: add netmem_t to skb_frag_t

From: Yunsheng Lin
Date: Mon Dec 18 2023 - 07:40:12 EST


On 2023/12/17 16:09, Mina Almasry wrote:
> Use netmem_t instead of page directly in skb_frag_t. Currently netmem_t
> is always a struct page underneath, but the abstraction allows efforts
> to add support for skb frags not backed by pages.
>
> There is unfortunately 1 instance where the skb_frag_t is assumed to be
> a bio_vec in kcm. For this case, add a debug assert that the skb frag is
> indeed backed by a page, and do a cast.
>
> Add skb[_frag]_fill_netmem_*() and skb_add_rx_frag_netmem() helpers so
> that the API can be used to create netmem skbs.
>
> Signed-off-by: Mina Almasry <almasrymina@xxxxxxxxxx>
>

...

>
> -typedef struct bio_vec skb_frag_t;
> +typedef struct skb_frag {
> + struct netmem *bv_page;

bv_page -> bv_netmem?

> + unsigned int bv_len;
> + unsigned int bv_offset;
> +} skb_frag_t;
>
> /**
> * skb_frag_size() - Returns the size of a skb fragment
> @@ -2431,22 +2436,37 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb)
> return skb_headlen(skb) + __skb_pagelen(skb);
> }
>

...

> /**
> @@ -2462,10 +2482,10 @@ static inline void skb_len_add(struct sk_buff *skb, int delta)
> }
>
> /**
> - * __skb_fill_page_desc - initialise a paged fragment in an skb
> + * __skb_fill_netmem_desc - initialise a paged fragment in an skb
> * @skb: buffer containing fragment to be initialised
> * @i: paged fragment index to initialise
> - * @page: the page to use for this fragment
> + * @netmem: the netmem to use for this fragment
> * @off: the offset to the data with @page
> * @size: the length of the data
> *
> @@ -2474,10 +2494,13 @@ static inline void skb_len_add(struct sk_buff *skb, int delta)
> *
> * Does not take any additional reference on the fragment.
> */
> -static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,
> - struct page *page, int off, int size)
> +static inline void __skb_fill_netmem_desc(struct sk_buff *skb, int i,
> + struct netmem *netmem, int off,
> + int size)
> {
> - __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size);
> + struct page *page = netmem_to_page(netmem);
> +
> + __skb_fill_netmem_desc_noacc(skb_shinfo(skb), i, netmem, off, size);
>
> /* Propagate page pfmemalloc to the skb if we can. The problem is
> * that not all callers have unique ownership of the page but rely
> @@ -2485,7 +2508,21 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,
> */
> page = compound_head(page);
> if (page_is_pfmemalloc(page))
> - skb->pfmemalloc = true;
> + skb->pfmemalloc = true;

Is it possible to introduce netmem_is_pfmemalloc() and netmem_compound_head()
for netmem, and have some built-time testing to ensure the implementation
is the same between page_is_pfmemalloc()/compound_head() and
netmem_is_pfmemalloc()/netmem_compound_head()? So that we can avoid the
netmem_to_page() as much as possible, especially in the driver.


> +}
> +
> +static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,
> + struct page *page, int off, int size)
> +{
> + __skb_fill_netmem_desc(skb, i, page_to_netmem(page), off, size);
> +}
> +

...

> */
> static inline struct page *skb_frag_page(const skb_frag_t *frag)
> {
> - return frag->bv_page;
> + return netmem_to_page(frag->bv_page);

It seems we are not able to have a safe type protection for the above
function, as the driver may be able to pass a devmem frag as a param here,
and pass the returned page into the mm subsystem, and compiler is not able
to catch it when compiling.

> }
>
> /**
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 83af8aaeb893..053d220aa2f2 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -845,16 +845,24 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
> }
> EXPORT_SYMBOL(__napi_alloc_skb);
>

...

> diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
> index 65d1f6755f98..5c46db045f4c 100644
> --- a/net/kcm/kcmsock.c
> +++ b/net/kcm/kcmsock.c
> @@ -636,9 +636,15 @@ static int kcm_write_msgs(struct kcm_sock *kcm)
> for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
> msize += skb_shinfo(skb)->frags[i].bv_len;
>
> + /* The cast to struct bio_vec* here assumes the frags are
> + * struct page based.
> + */
> + DEBUG_NET_WARN_ON_ONCE(
> + !skb_frag_page(&skb_shinfo(skb)->frags[0]));

It seems skb_frag_page() always return non-NULL in this patch, the above
checking seems unnecessary?