[PATCH] lost softirq, 2.6.24-rc7

From: Frank Rowand
Date: Tue Jan 15 2008 - 17:16:57 EST


From: Frank Rowand <frank.rowand@xxxxxxxxxxx>

(Ingo, there is a question for you after the description, just before the
patch.)

When running an interrupt and network intensive stress test with PREEMPT_RT
enabled, the target system stopped processing received network packets.
skbs from received packets were being queued by net_rx_action(), but the
NET_RX_SOFTIRQ softirq was never running to remove the skbs from the queue.
Since the target system root file system is NFS mounted, the system is now
effectively hung.

A pseudocode description of how this state was reached follows.
Each level of indentation represents a function call from the previous line.


ethernet driver irq handler receives packet
netif_rx()
queues skb (qlen == 1), raises NET_RX_SOFTIRQ

on return from irq
___do_softirq() [ 1 ]
Reset the pending bitmask
net_rx_action()
dequeues skb (qlen == 0)
jiffies incremented, so
break out of processing
and raise NET_RX_SOFTIRQ
(but don't deassert NAPI_STATE_SCHED)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ksoftirqd thread runs
process TIMER_SOFTIRQ
process RCU_SOFTIRQ
<< ksoftirqd sleeps >>
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

___do_softirq() [ 2 ]
Reset the pending bitmask
finds NET_RX_SOFTIRQ raised but already running
<< ___do_softirq() [ 2 ] completes >>

<< ___do_softirq() [ 1 ] resumes >>
the pending bitmask is empty, so NET_RX_SOFTIRQ is lost



Since NET_RX_SOFTIRQ was lost, net_rx_action() is never called, so
NAPI_STATE_SCHED is never deasserted.

When netif_rx() is called for subsequent packets, it queues the skb but
does not raise NET_RX_SOFTIRQ because it believes that NET_RX_SOFTIRQ
is active since NAPI_STATE_SCHED is set.

THE PROBLEM:
The softirq was lost when "___do_softirq() [ 2 ]" reset the softirq pending
bit for NET_RX_SOFTIRQ.


The above sequence was captured by the following trace:

softirq
napi pending softirq_ napi
state bitmask running qlen trace location
----- ------- -------- ---- ----------------------------------

0 000 0 0 ___do_softirq - exit

0 000 0 0 netif_rx
1 000 0 0 __napi_schedule

1 008 0 1 ___do_softirq - entry or restart
1 000 8 1 net_rx_action
1 000 8 1 process_backlog
1 10a 8 0 net_rx_action - softnet_break

1 10a 8 0 ksoftirqd - before while
1 108 8 0 ksoftirqd - after while
1 108 8 0 ksoftirqd - before while
1 008 8 0 ksoftirqd - after while

1 008 8 0 ___do_softirq - entry or restart
1 000 8 0 ___do_softirq - find already running
1 000 8 0 ___do_softirq - before or_softirq_pending()
1 000 8 0 ___do_softirq - after or_softirq_pending()
1 000 8 0 ___do_softirq - exit

1 000 0 0 ___do_softirq - before or_softirq_pending()
1 000 0 0 ___do_softirq - after or_softirq_pending()
1 000 0 0 ___do_softirq - exit

1 000 0 0 netif_rx

1 102 0 1 netif_rx
1 102 0 1 netif_rx - qlen > 0


The proposed fix is for __do_soft_irq() to re-enable any pending softirq
that it disabled, but did not process due to being already running. If
it is already running, then another context already saved the value of
local_softirq_pending(), then zeroed it out. Then subsequently the same
softirq was raised again, potentially after the other context completed
processing the softirq.

This patch might be related to some of the problems reported in the lkml
thread "2.6.20->2.6.21 - networking dies after random time", which began
16 June 2007. The reports in that thread did not have the specific details
that I need to determine whether this is the same problem that they
experienced. (The signature of the problem is that the napi state is 1,
napi qlen > 0, and NET_RX_SOFTIRQ is never running.)

Ingo,

One concern I have is that the attached patch might cause a softirq to be
processed twice. Is it always safe to invoke a softirq one extra time?

If this patch is not an accceptable fix for the problem, then I can also
supply a workaround in the NET_RX_SOFTIRQ that avoids this scenario.

Signed-off-by: Frank Rowand <frank.rowand@xxxxxxxxxxx>
---
kernel/softirq.c | 9 5 + 4 - 0 !
1 files changed, 5 insertions(+), 4 deletions(-)

Index: linux-2.6.24-rc7/kernel/softirq.c
===================================================================
--- linux-2.6.24-rc7.orig/kernel/softirq.c
+++ linux-2.6.24-rc7/kernel/softirq.c
@@ -261,7 +261,7 @@ static DEFINE_PER_CPU(u32, softirq_runni
static void ___do_softirq(const int same_prio_only)
{
int max_restart = MAX_SOFTIRQ_RESTART, max_loops = MAX_SOFTIRQ_RESTART;
- __u32 pending, available_mask, same_prio_skipped;
+ __u32 pending, available_mask, skipped;
struct softirq_action *h;
struct task_struct *tsk;
int cpu, softirq;
@@ -273,7 +273,7 @@ static void ___do_softirq(const int same
restart:
available_mask = -1;
softirq = 0;
- same_prio_skipped = 0;
+ skipped = 0;
/* Reset the pending bitmask before enabling irqs */
set_softirq_pending(0);

@@ -295,7 +295,7 @@ restart:
tsk = __get_cpu_var(ksoftirqd)[softirq].tsk;
if (tsk && tsk->normal_prio !=
current->normal_prio) {
- same_prio_skipped |= softirq_mask;
+ skipped |= softirq_mask;
available_mask &= ~softirq_mask;
goto next;
}
@@ -305,6 +305,7 @@ restart:
* Is this softirq already being processed?
*/
if (per_cpu(softirq_running, cpu) & softirq_mask) {
+ skipped |= softirq_mask;
available_mask &= ~softirq_mask;
goto next;
}
@@ -328,7 +329,7 @@ next:
pending >>= 1;
} while (pending);

- or_softirq_pending(same_prio_skipped);
+ or_softirq_pending(skipped);
pending = local_softirq_pending();
if (pending & available_mask) {
if (--max_restart)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/