Re: [RFC net-next v2 1/9] net: napi: Add napi_storage

From: Joe Damato
Date: Tue Sep 10 2024 - 12:10:57 EST


On Tue, Sep 10, 2024 at 07:52:17AM -0700, Jakub Kicinski wrote:
> On Tue, 10 Sep 2024 08:13:51 +0200 Joe Damato wrote:
> > On Mon, Sep 09, 2024 at 04:40:39PM -0700, Jakub Kicinski wrote:
> > > On Sun, 8 Sep 2024 16:06:35 +0000 Joe Damato wrote:
> > > > Add a persistent NAPI storage area for NAPI configuration to the core.
> > > > Drivers opt-in to setting the storage for a NAPI by passing an index
> > > > when calling netif_napi_add_storage.
> > > >
> > > > napi_storage is allocated in alloc_netdev_mqs, freed in free_netdev
> > > > (after the NAPIs are deleted), and set to 0 when napi_enable is called.
> > >
> > > > enum {
> > > > @@ -2009,6 +2019,9 @@ enum netdev_reg_state {
> > > > * @dpll_pin: Pointer to the SyncE source pin of a DPLL subsystem,
> > > > * where the clock is recovered.
> > > > *
> > > > + * @napi_storage: An array of napi_storage structures containing per-NAPI
> > > > + * settings.
> > >
> > > FWIW you can use inline kdoc, with the size of the struct it's easier
> > > to find it. Also this doesn't need to be accessed from fastpath so you
> > > can move it down.
> >
> > OK. I figured since it was being deref'd in napi_complete_done
> > (where we previously read napi_defer_hard_irqs and
> > gro_flush_timeout) it needed to be in the fast path.
> >
> > I'll move it down for the next RFC.
>
> Hm, fair point. In my mind I expected we still add the fast path fields
> to NAPI instances. And the storage would only be there to stash that
> information for the period of time when real NAPI instances are not
> present (napi_disable() -> napi_enable() cycles).

I see, I hadn't understood that part. It sounds like you were
thinking we'd stash the values in between whereas I thought we were
reading from the struct directly (hence the implementation of the
static inline wrappers).

I don't really have a preference one way or the other.

> But looking at napi_struct, all the cachelines seem full, anyway, so we
> can as well split the info. No strong preference, feel free to keep as
> is, then. But maybe rename from napi_storage to napi_config or such?

I can give that a shot for the next RFC and see how it looks. We can
always duplicate the fields in napi_struct and napi_config and copy
them over as you suggested above.

> > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > index 22c3f14d9287..ca90e8cab121 100644
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -6719,6 +6719,9 @@ void napi_enable(struct napi_struct *n)
> > > > if (n->dev->threaded && n->thread)
> > > > new |= NAPIF_STATE_THREADED;
> > > > } while (!try_cmpxchg(&n->state, &val, new));
> > > > +
> > > > + if (n->napi_storage)
> > > > + memset(n->napi_storage, 0, sizeof(*n->napi_storage));
> >
> > OK, your comments below will probably make more sense to me after I
> > try implementing it, but I'll definitely have some questions.
> >
> > > And here inherit the settings and the NAPI ID from storage, then call
> > > napi_hash_add(). napi_hash_add() will need a minor diff to use the
> > > existing ID if already assigned.
> >
> > I don't think I realized we settled on the NAPI ID being persistent.
> > I'm not opposed to that, I just think I missed that part in the
> > previous conversation.
> >
> > I'll give it a shot and see what the next RFC looks like.
>
> The main reason to try to make NAPI ID persistent from the start is that
> if it works we don't have to add index to the uAPI. I don't feel
> strongly about it, if you or anyone else has arguments against / why
> it won't work.

Yea, I think not exposing the index in the uAPI is probably a good
idea? Making the NAPI IDs persistent let's us avoid that so I can
give that a shot because it's easier from the user app perspective,
IMO.

> > > And the inverse of that has to happen in napi_disable() (unhash, save
> > > settings to storage), and __netif_napi_del() (don't unhash if it has
> > > index).
> > >
> > > I think that should work?
> >
> > Only one way to find out ;)
> >
> > Separately: your comment about documenting rings to NAPIs... I am
> > not following that bit.
> >
> > Is that a thing you meant should be documented for driver writers to
> > follow to reduce churn ?
>
> Which comment?

In this message:

https://lore.kernel.org/netdev/20240903124008.4793c087@xxxxxxxxxx/

You mentioned this, which I interpreted as a thing that core needs
to solve for, but perhaps this intended as advice for drivers?

Maybe it's enough to document how rings are distributed to NAPIs?

First set of NAPIs should get allocated to the combined channels,
then for remaining rx- and tx-only NAPIs they should be interleaved
starting with rx?

Example, asymmetric config: combined + some extra tx:

combined tx
[0..#combined-1] [#combined..#combined+#tx-1]

Split rx / tx - interleave:

[0 rx0] [1 tx0] [2 rx1] [3 tx1] [4 rx2] [5 tx2] ...

This would limit the churn when changing channel counts.