On Tue, 13 Aug 2024 21:14:40 -0400 Martin Karsten wrote:
What about NIC interrupt coalescing. defer_hard_irqs_count was supposedMaybe I am missing something, but I believe this would have the same
to be used with NICs which either don't have IRQ coalescing or have a
broken implementation. The timeout of 200usec should be perfectly within
range of what NICs can support.
If the NIC IRQ coalescing works, instead of adding a new timeout value
we could add a new deferral control (replacing defer_hard_irqs_count)
which would always kick in after seeing prefer_busy_poll() but also
not kick in if the busy poll harvested 0 packets.
problem that we describe for gro-timeout + defer-irq. When busy poll
does not harvest packets and the application thread is idle and goes to
sleep, it would then take up to 200 us to get the next interrupt. This
considerably increases tail latencies under low load.
In order get low latencies under low load, the NIC timeout would have to
be something like 20 us, but under high load the application thread will
be busy for longer than 20 us and the interrupt (and softirq) will come
too early and cause interference.
An FSM-like diagram would go a long way in clarifying things :)
It is tempting to think of the second timeout as 0 and in fact re-enable
interrupts right away. We have tried it, but it leads to a lot of
interrupts and corresponding inefficiencies, since a system below
capacity frequently switches between busy and idle. Using a small
timeout (20 us) for modest deferral and batching when idle is a lot more
efficient.
I see. I think we are on the same page. What I was suggesting is to use
the HW timer instead of the short timer. But I suspect the NIC you're
using isn't really good at clearing IRQs before unmasking. Meaning that
when you try to reactivate HW control there's already an IRQ pending
and it fires pointlessly. That matches my experience with mlx5.
If the NIC driver was to clear the IRQ state before running the NAPI
loop, we would have no pending IRQ by the time we unmask and activate
HW IRQs.