Re: [PATCH net-next] net/core: add optional threading for backlog processing

From: Felix Fietkau
Date: Sat Mar 25 2023 - 01:43:05 EST


On 25.03.23 04:19, Jakub Kicinski wrote:
On Fri, 24 Mar 2023 18:57:03 +0100 Felix Fietkau wrote:
>> It can basically be used to make RPS a bit more dynamic and >> configurable, because you can assign multiple backlog threads to a set >> of CPUs and selectively steer packets from specific devices / rx queues > > Can you give an example?
> > With the 4 CPU example, in case 2 queues are very busy - you're trying
> to make sure that the RPS does not end up landing on the same CPU as
> the other busy queue?
In this part I'm thinking about bigger systems where you want to have a
group of CPUs dedicated to dealing with network traffic without
assigning a fixed function (e.g. NAPI processing or RPS target) to each
one, allowing for more dynamic processing.

I tried the threaded NAPI on larger systems and helped others try,
and so far it's not been beneficial :( Even the load balancing
improvements are not significant enough to use it, and there
is a large risk of scheduler making the wrong decision.

Hence my questioning - I'm trying to understand what you're doing
differently.
I didn't actually run any tests on bigger systems myself, so I don't know how to tune it for those.

>> to them and allow the scheduler to take care of the rest. > > You trust the scheduler much more than I do, I think :)
In my tests it brings down latency (both avg and p99) considerably in
some cases. I posted some numbers here:
https://lore.kernel.org/netdev/e317d5bc-cc26-8b1b-ca4b-66b5328683c4@xxxxxxxx/

Could you provide the full configuration for this test?
In non-threaded mode the RPS is enabled to spread over remaining
3 cores?
In this test I'm using threaded NAPI and backlog_threaded without any fixed core assignment.

- Felix