Re: [PATCH net-next v16 05/13] page_pool: devmem support
From: Mina Almasry
Date: Wed Jul 10 2024 - 16:30:04 EST
On Wed, Jul 10, 2024 at 9:49 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>
> On Wed, 10 Jul 2024 00:17:38 +0000 Mina Almasry wrote:
> > @@ -68,17 +107,103 @@ static inline netmem_ref page_to_netmem(struct page *page)
> >
> > static inline int netmem_ref_count(netmem_ref netmem)
> > {
> > + /* The non-pp refcount of net_iov is always 1. On net_iov, we only
> > + * support pp refcounting which uses the pp_ref_count field.
> > + */
> > + if (netmem_is_net_iov(netmem))
> > + return 1;
> > +
> > return page_ref_count(netmem_to_page(netmem));
> > }
>
> How can this work if we had to revert the patch which made all of
> the networking stack take pp-aware refs? Maybe we should add the
> refcount, and let it be bumped, but WARN() if the net_iov is released
> with refcount other than 1? Or we need a very solid explanation why
> the conversion had to be reverted and this is fine.
>
Right, as you are aware, page refcounting is based on 2 refcounts: pp
refs and full page refs. To be honest I find the 2-ref flow confusing
and I made an effort to avoid porting this bit to net_iovs. net_iovs
just supports 1 refcount, which is the pp-ref.
My intention is that when a reference is needed on a net_iov, we
obtain the pp-ref, and when we drop a reference on a net_iov, we drop
the pp_ref. This is able to work for net_iov but not pages, because
(as you explained to me) pages can be inserted into the net stack with
full page refs. So when it comes to refcounting pages we need to be
careful which ref to obtain or drop depending on is_pp_netmem() and
skb->pp_recycle (as pp_recycle serves as a concurrency check, like you
explained).
AFAICT, since net_iovs always originate from the net stack, we can
make the simplification that they're always seeded with 1 pp-ref, and
never support non-pp-refs. This simplifies the refcounting such that:
1. net_iov are always is_pp_netmem (they are never disconnected from
the pp as they never have elevated non-pp refcount), and
2. net_iov refcounting doesn't need to check skb->pp_recycle for
refcounting, because we can be sure that the caller always has a
non-pp ref (since it's the only one supported).
Currently, as written, I just realized I did not add net_iov support
to __skb_frag_ref(). But net_iov does support skb_pp_frag_ref(). So
there is no way to increment a non-pp ref for net_iov.
If we want to add __skb_frag_ref() support for net_iov I suggest something like:
diff --git a/include/linux/skbuff_ref.h b/include/linux/skbuff_ref.h
index 0f3c58007488a..02f7f4c7d4821 100644
--- a/include/linux/skbuff_ref.h
+++ b/include/linux/skbuff_ref.h
@@ -17,7 +17,13 @@
*/
static inline void __skb_frag_ref(skb_frag_t *frag)
{
- get_page(skb_frag_page(frag));
+ netmem_ref netmem = skb_frag_netmem(frag);
+
+ /* netmem always uses pp-refs for refcounting. Never non-pp refs. */
+ if (!netmem_is_net_iov(netmem))
+ get_page(netmem_to_page(netmem));
+ else
+ page_pool_ref_netmem(netmem);
}
If you don't like the 1 ref simplification, I can definitely add a
second refcount as you suggest, but AFAICT the simplification is safe
due to how net_iov are originated, and maybe also because devmem usage
in the net stack is limited due to all the skb_is_readable() checks,
and it's possible that the edge cases don't reproduce. I was looking
to find a concrete bug report with devmem before taking a hammer and
adding a secondary refcount, rather than do it preemptively, but I'm
happy to look into it if you insist.
> > static inline unsigned long netmem_to_pfn(netmem_ref netmem)
> > {
> > + if (netmem_is_net_iov(netmem))
> > + return 0;
> > +
> > return page_to_pfn(netmem_to_page(netmem));
> > }
>
> Can we move this out and rename it to netmem_pfn_trace() ?
> Silently returning 0 is not generally okay, but since it's only
> for tracing we don't care.
>
Yes, I will do.
> > +static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
> > +{
> > + return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
> > +}
> > +
> > +static inline unsigned long netmem_get_pp_magic(netmem_ref netmem)
> > +{
> > + return __netmem_clear_lsb(netmem)->pp_magic;
> > +}
> > +
> > +static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic)
> > +{
> > + __netmem_clear_lsb(netmem)->pp_magic |= pp_magic;
> > +}
> > +
> > +static inline void netmem_clear_pp_magic(netmem_ref netmem)
> > +{
> > + __netmem_clear_lsb(netmem)->pp_magic = 0;
> > +}
> > +
> > +static inline struct page_pool *netmem_get_pp(netmem_ref netmem)
> > +{
> > + return __netmem_clear_lsb(netmem)->pp;
> > +}
> > +
> > +static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
> > +{
> > + __netmem_clear_lsb(netmem)->pp = pool;
> > +}
>
> Why is all this stuff in the main header? It's really low level.
> Please put helpers which are only used by the core in a header
> under net/core/, like net/core/dev.h
Sorry, will do.
--
Thanks,
Mina