[patch] softirq-2.4.10-B2

From: Ingo Molnar (mingo@elte.hu)
Date: Fri Sep 28 2001 - 02:18:17 EST

On Fri, 28 Sep 2001, Andrea Arcangeli wrote:

> some comment after reading your softirq-2.4.10-A7.
> > - softirq handling can now be restarted N times within do_softirq(), if a
> > softirq gets reactivated while it's being handled.
> is this really necessary after introducing the unwakeup logic? What do
> you get if you allow at max 1 softirq pass as before?

yes, of course it's necessery. The reason is really simple: softirqs have
a natural ordering, and if we are handling softirq #2, while softirq #1
gets reactivated, nothing will process softirq #1 if we do only a single
loop. I explained this in full detail during my previous softirq patch a
few months ago. The unwakeup logic only reverts the wakeup of ksoftirqd,
it will not magically process pending softirqs! I explained all these
effects in detail when i wrote the first softirq-looping patch, and when
you originally came up with ksoftirqd.

(Even with just a single softirq activated, if we are just on the way out
of handing softirqs (the first and last time) at the syscall level, and a
hardirq comes in that activates this softirq, then there is nothing that
will process the softirq: 1) the hardirq's own handling of softirqs is
inhibed due to the softirq-atomic section within do_softirq() 2) this loop
will not process the new work since it's on the way out.)

basically the problem is that there is a big 'gap' between the activation
of softirqs, and the time when ksoftirqd starts running. There are a
number of mechanisms within the networking stack that are quite
timing-sensitive. And generally, if there is work A and work B that are
related, and we've executed work A (the hardware interrupt), then it's
almost always the best idea to execute work B as soon as possible. Any
'delaying' of work B should only be done for non-performance reasons:
eg. fairness in this case. Now it's MAX_SOFTIRQ_RESTART that balances
performance against fairness. (in most kernel subsystems we almost always
give preference to performance over fairness - without ignoring fairness
of course.)

there is also another bad side-effect of ksoftirqd as well: if it's
relatively inactive for some time then it will 'collect' current->counter
scheduler ticks, basically boosting its performance way above that of the
intended ->nice = 19. It will then often 'suck' softirq handling to
itself, due to its more agressive scheduling position. To combat this
effect, i've modified ksoftirq to do:

        if (current->counter > 1)
                current->counter = 1;

(this is a tiny bit racy wrt. the timer interrupt, but it's harmless.)

the current form of softirqs were designed by Alexey and David for the
purposes high-performance networking, as part of the 'softnet' effort.
Networking remains the biggest user of softirqs - while there are a few
cases of high-frequency tasklet uses, generally it's the network stack's
TX_SOFTIRQ and RX_SOFTIRQ workload that we care about most - and tasklets.
(see the tasklet fixes in the patch.) Via TX-completion-IRQ capable cards,
there can be a constant and separate TX and RX softirq workload added.

especially under high loads, the work done in the 'later' net-softirq,
NET_RX_SOFTIRQ can mount up, and thus the amount of pending work within
NET_TX_SOFTIRQ can mount up. Furthermore, there is a mechanizm within both
the tx and rx softirq that can break out of softirq handling before all
work has been handled: if a jiffy (10 msecs) has passed, or if we have
processed more than netdev_max_backlog (default: 300) packets.

there are a number of other options i experimented with:

 - handling softirqs in schedule(), before runqueue_lock is taken, in a
   softirq- and irq- atomic way, unless ->need_resched is set. This was
   done in earlier kernels, and might be a good idea to do again =>
   especially with unwakeup(). The downside is extra cost within

 - tuning the amount of work within the tx/rx handlers, both increasing
   and decreasing the amount of packets. Decreasing the amount of work has
   the effect of decreasing the latency of processing RX-triggered TX
   events (such as ACK), and generally handling TX/RX events more
   smoothly, but it also has the effect of increasing the cache footprint.

 - exchanging the order of tx and rx softirqs.

 - using jiffies within do_softirq() to make sure it does not execute for
   more than 10-20 msecs.

 - feeding back a 'work left' integer through the ->action functions to
   do_softirq() - who can then do decisions which softirq to restart.
   (basically a mini softirq scheduler.)

this later one looked pretty powerful because it provides more information
ot the generic layer - but it's something i think might be too intrusive
for 2.4. For now, the simplest and most effective method of all was the

- i've done one more refinement to the current patch: do_softirq() now
  checks current->need_resched and it will break out of softirq processing
  if it's 1. Note that do_softirq() is a rare function which *must not* do
  '!current->need_resched': poll_idle() uses need_resched == -1 as a
  special value. (but normally irq-level code does not check
  ->need_resched so this is a special case). This prevent irqs that hit
  the idle-poll task to do normal softirq processing - and not break out
  after one loop.

i've attached the softirq-2.4.10-B2 that has your TASK_RUNNING suggestion,
Oleg's fixes and this change included.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

This archive was generated by hypermail 2b29 : Sun Sep 30 2001 - 21:01:00 EST