Re: [PATCH net] net: mana: Fix perf regression: remove rx_cqes, tx_cqes counters

From: Horatiu Vultur
Date: Fri May 26 2023 - 02:45:39 EST


The 05/25/2023 14:34, Haiyang Zhang wrote:
>
> > -----Original Message-----
> > From: Horatiu Vultur <horatiu.vultur@xxxxxxxxxxxxx>
> > Sent: Thursday, May 25, 2023 2:49 AM
> > To: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> > Cc: linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Dexuan Cui
> > <decui@xxxxxxxxxxxxx>; KY Srinivasan <kys@xxxxxxxxxxxxx>; Paul Rosswurm
> > <paulros@xxxxxxxxxxxxx>; olaf@xxxxxxxxx; vkuznets@xxxxxxxxxx;
> > davem@xxxxxxxxxxxxx; wei.liu@xxxxxxxxxx; edumazet@xxxxxxxxxx;
> > kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; leon@xxxxxxxxxx; Long Li
> > <longli@xxxxxxxxxxxxx>; ssengar@xxxxxxxxxxxxxxxxxxx; linux-
> > rdma@xxxxxxxxxxxxxxx; daniel@xxxxxxxxxxxxx; john.fastabend@xxxxxxxxx;
> > bpf@xxxxxxxxxxxxxxx; ast@xxxxxxxxxx; Ajay Sharma
> > <sharmaajay@xxxxxxxxxxxxx>; hawk@xxxxxxxxxx; linux-
> > kernel@xxxxxxxxxxxxxxx; stable@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH net] net: mana: Fix perf regression: remove rx_cqes,
> > tx_cqes counters
> >
> > [Some people who received this message don't often get email from
> > horatiu.vultur@xxxxxxxxxxxxx. Learn why this is important at
> > https://aka.ms/LearnAboutSenderIdentification ]
> >
> > The 05/24/2023 14:22, Haiyang Zhang wrote:
> >
> > Hi Haiyang,
> >
> > >
> > > The apc->eth_stats.rx_cqes is one per NIC (vport), and it's on the
> > > frequent and parallel code path of all queues. So, r/w into this
> > > single shared variable by many threads on different CPUs creates a
> > > lot caching and memory overhead, hence perf regression. And, it's
> > > not accurate due to the high volume concurrent r/w.
> >
> > Do you have any numbers to show the improvement of this change?
>
> The numbers are not published. The perf regression of the previous
> patch is very significant, and this patch eliminates the regression.
>
> >
> > >
> > > Since the error path of mana_poll_rx_cq() already has warnings, so
> > > keeping the counter and convert it to a per-queue variable is not
> > > necessary. So, just remove this counter from this high frequency
> > > code path.
> > >
> > > Also, remove the tx_cqes counter for the same reason. We have
> > > warnings & other counters for errors on that path, and don't need
> > > to count every normal cqe processing.
> >
> > Will you not have problems with the counter 'apc->eth_stats.tx_cqe_err'?
> > It is not in the hot path but you will have concurrent access to it.
>
> Yes, but that error happens rarely, so a shared variable is good enough. So, I
> don't change it in this patch.

OK, I understand.
Maybe this can be fixed in a different patch at a later point. Thanks.

Reviwed-by: Horatiu Vultur <horatiu.vultur@xxxxxxxxxxxxx>

>
> Thanks,
> - Haiyang
>

--
/Horatiu