Re: [net-next v5 PATCH] page_pool: handle page recycle for NUMA_NO_NODE condition

From: Saeed Mahameed
Date: Mon Dec 23 2019 - 17:10:16 EST


On Mon, 2019-12-23 at 17:52 +0100, Jesper Dangaard Brouer wrote:
> On Mon, 23 Dec 2019 09:57:00 +0200
> Ilias Apalodimas <ilias.apalodimas@xxxxxxxxxx> wrote:
>
> > Hi Jesper,
> >
> > Looking at the overall path again, i still need we need to
> > reconsider
> > pool->p.nid semantics.
> >
> > As i said i like the patch and the whole functionality and code
> > seems fine,
> > but here's the current situation.
> > If a user sets pool->p.nid == NUMA_NO_NODE and wants to use
> > page_pool_update_nid() the whole behavior feels a liitle odd.
>
> As soon as driver uses page_pool_update_nid() than means they want to
> control the NUMA placement explicitly. As soon as that happens, it
> is
> the drivers responsibility and choice, and page_pool API must respect
> that (and not automatically change that behind drivers back).
>
>
> > page_pool_update_nid() first check will always be true since .nid =
> > NUMA_NO_NODE). Then we'll update this to a real nid. So we end up
> > overwriting what the user initially coded in.
> >
> > This is close to what i proposed in the previous mails on this
> > thread. Always store a real nid even if the user explicitly
> > requests
> > NUMA_NO_NODE.
> >
> > So semantics is still a problem. I'll stick to what we initially
> > suggested.
> > 1. We either *always* store a real nid
> > or
> > 2. If NUMA_NO_NODE is present ignore every other check and recycle
> > the memory blindly.
> >
>
> Hmm... I actually disagree with both 1 and 2.
>
> My semantics proposal:
> If driver configures page_pool with NUMA_NO_NODE, then page_pool
> tried
> to help get the best default performance. (Which according to
> performance measurements is to have RX-pages belong to the NUMA node
> RX-processing runs on).
>

which is the msix affinity.. the pool has no knowledge of that on
initialization.

> The reason I want this behavior is that during driver init/boot, it
> can
> easily happen that a driver allocates RX-pages from wrong NUMA node.
> This will cause a performance slowdown, that normally doesn't happen,
> because without a cache (like page_pool) RX-pages would fairly
> quickly
> transition over to the RX NUMA node (instead we keep recycling these,
> in your case #2, where you suggest recycle blindly in case of
> NUMA_NO_NODE). IMHO page_pool should hide this border-line case from
> driver developers.
>

So, Ilias's #1 suggestion make sense, to always store a valid nid
value.
the question is which value to store on initialization if the user
provided NUMA_NO_NODE ? I don't think the pool is capable of choosing
the right value, so let's just use numa node 0.

If the developer cares, he would have picked the right affinity on
initialization, or he can just call pool_update_nid() when the affinity
is determined and every thing will be fine after that point.

My 2cent is that you just can't provide the perfect performance when
the user uses NUMA_NO_NODE, so just pick any default concrete node id
and avoid dealing with NUMA_NO_NODE in the pool fast or even slow
path..

> --Jesper
>
>
> > On Fri, Dec 20, 2019 at 06:06:49PM +0200, Ilias Apalodimas wrote:
> > > On Fri, Dec 20, 2019 at 04:22:54PM +0100, Jesper Dangaard Brouer
> > > wrote:
> > > > On Fri, 20 Dec 2019 12:49:37 +0200
> > > > Ilias Apalodimas <ilias.apalodimas@xxxxxxxxxx> wrote:
> > > >
> > > > > On Fri, Dec 20, 2019 at 11:41:16AM +0100, Jesper Dangaard
> > > > > Brouer wrote:
> > > > > > On Fri, 20 Dec 2019 12:23:14 +0200
> > > > > > Ilias Apalodimas <ilias.apalodimas@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > > Hi Jesper,
> > > > > > >
> > > > > > > I like the overall approach since this moves the check
> > > > > > > out
> > > > > > > of the hotpath. @Saeed, since i got no hardware to test
> > > > > > > this on, would it be possible to check that it still
> > > > > > > works
> > > > > > > fine for mlx5?
> > > > > > >
> > > > > > > [...]
> > > > > > > > + struct ptr_ring *r = &pool->ring;
> > > > > > > > + struct page *page;
> > > > > > > > + int pref_nid; /* preferred NUMA node */
> > > > > > > > +
> > > > > > > > + /* Quicker fallback, avoid locks when ring is
> > > > > > > > empty */
> > > > > > > > + if (__ptr_ring_empty(r))
> > > > > > > > + return NULL;
> > > > > > > > +
> > > > > > > > + /* Softirq guarantee CPU and thus NUMA node is
> > > > > > > > stable. This,
> > > > > > > > + * assumes CPU refilling driver RX-ring will
> > > > > > > > also run RX-NAPI.
> > > > > > > > + */
> > > > > > > > + pref_nid = (pool->p.nid == NUMA_NO_NODE) ?
> > > > > > > > numa_mem_id() : pool->p.nid;
> > > > > > >
> > > > > > > One of the use cases for this is that during the
> > > > > > > allocation
> > > > > > > we are not guaranteed to pick up the correct NUMA node.
> > > > > > > This will get automatically fixed once the driver starts
> > > > > > > recycling packets.
> > > > > > >
> > > > > > > I don't feel strongly about this, since i don't usually
> > > > > > > like hiding value changes from the user but, would it
> > > > > > > make
> > > > > > > sense to move this into __page_pool_alloc_pages_slow()
> > > > > > > and
> > > > > > > change the pool->p.nid?
> > > > > > >
> > > > > > > Since alloc_pages_node() will replace NUMA_NO_NODE with
> > > > > > > numa_mem_id() regardless, why not store the actual node
> > > > > > > in
> > > > > > > our page pool information? You can then skip this and
> > > > > > > check
> > > > > > > pool->p.nid == numa_mem_id(), regardless of what's
> > > > > > > configured.
> > > > > >
> > > > > > This single code line helps support that drivers can
> > > > > > control
> > > > > > the nid themselves. This is a feature that is only used my
> > > > > > mlx5 AFAIK.
> > > > > >
> > > > > > I do think that is useful to allow the driver to "control"
> > > > > > the nid, as pinning/preferring the pages to come from the
> > > > > > NUMA node that matches the PCI-e controller hardware is
> > > > > > installed in does have benefits.
> > > > >
> > > > > Sure you can keep the if statement as-is, it won't break
> > > > > anything. Would we want to store the actual numa id in
> > > > > pool->p.nid if the user selects 'NUMA_NO_NODE'?
> > > >
> > > > No. pool->p.nid should stay as NUMA_NO_NODE, because that makes
> > > > it
> > > > dynamic. If someone moves an RX IRQ to another CPU on another
> > > > NUMA node, then this 'NUMA_NO_NODE' setting makes pages
> > > > transitioned automatically.
> > > Ok this assumed that drivers were going to use
> > > page_pool_nid_changed(), but with the current code we don't have
> > > to
> > > force them to do that. Let's keep this as-is.
> > >
> > > I'll be running a few more tests and wait in case Saeed gets a
> > > chance to test it and send my reviewed-by
>
>