On Fri, 28 Sep 2001, Andrea Arcangeli wrote:
> some comment after reading your softirq-2.4.10-A7.
>
> > - softirq handling can now be restarted N times within do_softirq(), if a
> > softirq gets reactivated while it's being handled.
>
> is this really necessary after introducing the unwakeup logic? What do
> you get if you allow at max 1 softirq pass as before?
yes, of course it's necessery. The reason is really simple: softirqs have
a natural ordering, and if we are handling softirq #2, while softirq #1
gets reactivated, nothing will process softirq #1 if we do only a single
loop. I explained this in full detail during my previous softirq patch a
few months ago. The unwakeup logic only reverts the wakeup of ksoftirqd,
it will not magically process pending softirqs! I explained all these
effects in detail when i wrote the first softirq-looping patch, and when
you originally came up with ksoftirqd.
(Even with just a single softirq activated, if we are just on the way out
of handing softirqs (the first and last time) at the syscall level, and a
hardirq comes in that activates this softirq, then there is nothing that
will process the softirq: 1) the hardirq's own handling of softirqs is
inhibed due to the softirq-atomic section within do_softirq() 2) this loop
will not process the new work since it's on the way out.)
basically the problem is that there is a big 'gap' between the activation
of softirqs, and the time when ksoftirqd starts running. There are a
number of mechanisms within the networking stack that are quite
timing-sensitive. And generally, if there is work A and work B that are
related, and we've executed work A (the hardware interrupt), then it's
almost always the best idea to execute work B as soon as possible. Any
'delaying' of work B should only be done for non-performance reasons:
eg. fairness in this case. Now it's MAX_SOFTIRQ_RESTART that balances
performance against fairness. (in most kernel subsystems we almost always
give preference to performance over fairness - without ignoring fairness
of course.)
there is also another bad side-effect of ksoftirqd as well: if it's
relatively inactive for some time then it will 'collect' current->counter
scheduler ticks, basically boosting its performance way above that of the
intended ->nice = 19. It will then often 'suck' softirq handling to
itself, due to its more agressive scheduling position. To combat this
effect, i've modified ksoftirq to do:
if (current->counter > 1)
current->counter = 1;
(this is a tiny bit racy wrt. the timer interrupt, but it's harmless.)
the current form of softirqs were designed by Alexey and David for the
purposes high-performance networking, as part of the 'softnet' effort.
Networking remains the biggest user of softirqs - while there are a few
cases of high-frequency tasklet uses, generally it's the network stack's
TX_SOFTIRQ and RX_SOFTIRQ workload that we care about most - and tasklets.
(see the tasklet fixes in the patch.) Via TX-completion-IRQ capable cards,
there can be a constant and separate TX and RX softirq workload added.
especially under high loads, the work done in the 'later' net-softirq,
NET_RX_SOFTIRQ can mount up, and thus the amount of pending work within
NET_TX_SOFTIRQ can mount up. Furthermore, there is a mechanizm within both
the tx and rx softirq that can break out of softirq handling before all
work has been handled: if a jiffy (10 msecs) has passed, or if we have
processed more than netdev_max_backlog (default: 300) packets.
there are a number of other options i experimented with:
- handling softirqs in schedule(), before runqueue_lock is taken, in a
softirq- and irq- atomic way, unless ->need_resched is set. This was
done in earlier kernels, and might be a good idea to do again =>
especially with unwakeup(). The downside is extra cost within
schedule().
- tuning the amount of work within the tx/rx handlers, both increasing
and decreasing the amount of packets. Decreasing the amount of work has
the effect of decreasing the latency of processing RX-triggered TX
events (such as ACK), and generally handling TX/RX events more
smoothly, but it also has the effect of increasing the cache footprint.
- exchanging the order of tx and rx softirqs.
- using jiffies within do_softirq() to make sure it does not execute for
more than 10-20 msecs.
- feeding back a 'work left' integer through the ->action functions to
do_softirq() - who can then do decisions which softirq to restart.
(basically a mini softirq scheduler.)
this later one looked pretty powerful because it provides more information
ot the generic layer - but it's something i think might be too intrusive
for 2.4. For now, the simplest and most effective method of all was the
looping.
- i've done one more refinement to the current patch: do_softirq() now
checks current->need_resched and it will break out of softirq processing
if it's 1. Note that do_softirq() is a rare function which *must not* do
'!current->need_resched': poll_idle() uses need_resched == -1 as a
special value. (but normally irq-level code does not check
->need_resched so this is a special case). This prevent irqs that hit
the idle-poll task to do normal softirq processing - and not break out
after one loop.
i've attached the softirq-2.4.10-B2 that has your TASK_RUNNING suggestion,
Oleg's fixes and this change included.
Ingo
This archive was generated by hypermail 2b29 : Sun Sep 30 2001 - 21:01:00 EST