Re: [RFC PATCH 1/4] softirq: Limit vector to a single iteration on IRQ tail

From: David Miller
Date: Sun Jan 21 2018 - 11:57:22 EST


From: Frederic Weisbecker <frederic@xxxxxxxxxx>
Date: Sun, 21 Jan 2018 17:30:09 +0100

> On Fri, Jan 19, 2018 at 01:47:27PM -0500, David Miller wrote:
>> From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> Date: Fri, 19 Jan 2018 10:25:03 -0800
>>
>> > On Fri, Jan 19, 2018 at 8:16 AM, David Miller <davem@xxxxxxxxxxxxx> wrote:
>> >>
>> >> So this "get requeued" condition I think will trigger always for
>> >> networking tunnel decapsulation.
>> >
>> > Hmm. Interesting and a perhaps bit discouraging.
>> >
>> > Will it always be just a _single_ level of indirection, or will double
>> > tunnels (I assume some people do that, just because the universe is
>> > out to get us) then result in this perhaps repeating several times?
>>
>> Every level of tunnel encapsulation will trigger a new softirq.
>>
>> So if you have an IP tunnel inside of an IP tunnel that will trigger
>> twice.
>
> So we may likely need to come back to a call counter based limit :-s

I'm not so sure exactly to what extent we should try to handle that,
and if so exactly how.

The only reason we do this is to control stack usage. The re-softirq
on tunnel decapsulation is functioning as a continuation of sorts.
If we are already running in the net_rx_action() softirq it would
be so much nicer and efficient to just make sure net_rx_action()
runs the rest of the packet processing.

Right now that doesn't happen because net_rx_action() runs only one
round of NAPI polling. It doesn't re-snapshot the list and try
again before returning.

net_rx_action() has it's own logic like do_softirq() does for timing
out in the middle of it's work which may or may not have some further
influence upon fairness to other softirqs.

Basically, it runs a snapshot the NAPI poll list for this CPU until
either usecs_to_jiffies(netdev_budget_usecs) jiffies have elapsed or
the list snapshot has been fully processed.

The default netdev_budget_usecs is 2000, which is my math isn't broken
is 2 jiffies when HZ=1000. I know why we use 2000 instead of 1000,
it's to handle the case where we are invoked very close to the end
of a jiffy. That situation does happen often enough in practice to
cause performance problems.

It would seem that all of these issues are why the tendency is to deal
with measuring cost using time rather than a simpler heuristic such as
whether softirqs were retriggered during a softirq run.