RE: [PATCH 2/3] softirq: avoid spurious stalls due to need_resched()

From: David Laight
Date: Mon Mar 06 2023 - 04:14:11 EST


From: Thomas Gleixner
> Sent: 05 March 2023 20:43
...
> The point is that softirqs are just the proliferation of an at least 50
> years old OS design paradigm. Back then everyhting which run in an
> interrupt handler was "important" and more or less allowed to hog the
> CPU at will.
>
> That obviously caused problems because it prevented other interrupt
> handlers from being served.
>
> This was attempted to work around in hardware by providing interrupt
> priority levels. No general purpose OS utilized that ever because there
> is no way to get this right. Not even on UP, unless you build a designed
> for the purpose "OS".
>
> Soft interrupts are not any better. They avoid the problem of stalling
> interrupts by moving the problem one level down to the scheduler.
>
> Granted they are a cute hack, but at the very end they are still evading
> the resource control mechanisms of the OS by defining their own rules:

>From some measurements I've done, while softints seem like a good
idea they are almost pointless.

What usually happens is a hardware interrupt happens, does some
of the required work, schedules a softint and returns.
Immediately a softint happens (at the same instruction) and
does all the rest of the work.
The work has to be done, but you've added cost of the extra
scheduling and interrupt - so overall it is slower.

The massive batching up of some operations (like ethernet
transmit clearing and rx setup, and things being freed after rcu)
doesn't help latency.
Without the batching the softint would finish faster and cause
less of a latency 'problem' to whatever was interrupted.

Now softints do help interrupt latency, but that is only relevant
if you have critical interrupts (like pulling data out of a hardware
fifo). Most modern hardware doesn't have anything that critical.

Now there is code that can decide to drop softint processing to
a normal thread. If that ever happens you probably lose 'big time'.
Normal softint processing is higher priority than any process code.
But the kernel thread runs at the priority of a normal user thread.
Pretty much the lowest of the low.
So all this 'high priority' interrupt related processing that
really does have to happen to keep the system running just doesn't
get scheduled.

I think it was Eric who had problems with ethernet packets being
dropped and changed the logic (of dropping to a thread) to make
it much less likely - but that got reverted (well more code added
that effectively reverted it) not long after.

Try (as I was) to run a test that requires you to receive ALL
of the 500000 ethernet packets being sent to an interface every
second while also doing enough processing on the packets to
make the system (say) 90% busy (real time UDP audio processing)
and you soon find the defaults are entirely hopeless.

Even the interrupt 'mitigation' options on the ethernet controller
don't actually work - packets get dropped at the low level.
(That will fail on an otherwise idle system.)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)