On Fri, 24 Mar 2023 18:35:00 +0100 Felix Fietkau wrote:
I'm primarily testing this on routers with 2 or 4 CPUs and limited processing power, handling routing/NAT. RPS is typically needed to properly distribute the load across all available CPUs. When there is only a small number of flows that are pushing a lot of traffic, a static RPS assignment often leaves some CPUs idle, whereas others become a bottleneck by being fully loaded. Threaded NAPI reduces this a bit, but CPUs can become bottlenecked and fully loaded by a NAPI thread alone.
The NAPI thread becomes a bottleneck with RPS enabled?
Making backlog processing threaded helps split up the processing work even more and distribute it onto remaining idle CPUs.
You'd want to have both threaded NAPI and threaded backlog enabled?
It can basically be used to make RPS a bit more dynamic and configurable, because you can assign multiple backlog threads to a set of CPUs and selectively steer packets from specific devices / rx queues
Can you give an example?
With the 4 CPU example, in case 2 queues are very busy - you're trying
to make sure that the RPS does not end up landing on the same CPU as
the other busy queue?
to them and allow the scheduler to take care of the rest.
You trust the scheduler much more than I do, I think :)