Re: [PATCH 0/7 v4] Introduce a bulk order-0 page allocator with two in-tree users

From: Jesper Dangaard Brouer
Date: Wed Mar 17 2021 - 13:21:52 EST

Next message: Sathyanarayanan Kuppuswamy Natarajan: "Re: [PATCH v2 1/1] PCI: pciehp: Skip DLLSC handling if DPC is triggered"
Previous message: Greg Kroah-Hartman: "Linux 5.11.7"
In reply to: Alexander Lobakin: "Re: [PATCH 0/7 v4] Introduce a bulk order-0 page allocator with two in-tree users"
Next in thread: Alexander Lobakin: "Re: [PATCH 0/7 v4] Introduce a bulk order-0 page allocator with two in-tree users"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 17 Mar 2021 16:52:32 +0000
Alexander Lobakin <alobakin@xxxxx> wrote:

> From: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> Date: Wed, 17 Mar 2021 17:38:44 +0100
>
> > On Wed, 17 Mar 2021 16:31:07 +0000
> > Alexander Lobakin <alobakin@xxxxx> wrote:
> >
> > > From: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> > > Date: Fri, 12 Mar 2021 15:43:24 +0000
> > >
> > > Hi there,
> > >
> > > > This series is based on top of Matthew Wilcox's series "Rationalise
> > > > __alloc_pages wrapper" and does not apply to 5.12-rc2. If you want to
> > > > test and are not using Andrew's tree as a baseline, I suggest using the
> > > > following git tree
> > > >
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-rebase-v4r2
> > >
> > > I gave this series a go on my setup, it showed a bump of 10 Mbps on
> > > UDP forwarding, but dropped TCP forwarding by almost 50 Mbps.
> > >
> > > (4 core 1.2GHz MIPS32 R2, page size of 16 Kb, Page Pool order-0
> > > allocations with MTU of 1508 bytes, linear frames via build_skb(),
> > > GRO + TSO/USO)
> >
> > What NIC driver is this?
>
> Ah, forgot to mention. It's a WIP driver, not yet mainlined.
> The NIC itself is basically on-SoC 1G chip.

Hmm, then it is really hard to check if your driver is doing something
else that could cause this.

Well, can you try to lower the page_pool bulking size, to test the
theory from Wilcox that we should do smaller bulking to avoid pushing
cachelines into L2 when walking the LRU list. You might have to go as
low as bulk=8 (for N-way associative level of L1 cache).

In function: __page_pool_alloc_pages_slow() adjust variable:
const int bulk = PP_ALLOC_CACHE_REFILL;

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

Next message: Sathyanarayanan Kuppuswamy Natarajan: "Re: [PATCH v2 1/1] PCI: pciehp: Skip DLLSC handling if DPC is triggered"
Previous message: Greg Kroah-Hartman: "Linux 5.11.7"
In reply to: Alexander Lobakin: "Re: [PATCH 0/7 v4] Introduce a bulk order-0 page allocator with two in-tree users"
Next in thread: Alexander Lobakin: "Re: [PATCH 0/7 v4] Introduce a bulk order-0 page allocator with two in-tree users"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]