NMI lockup in fib_sync_down

From: Phil Oester
Date: Wed Apr 27 2005 - 16:27:05 EST


[Tried a couple of times to send this to netdev@xxxxxxxxxxx, but
never seems to show up]

Trying to update from 2.6.10 to 2.6.11 on a gateway, and keep
getting an NMI lockup:

NMI Watchdog detected LOCKUP on CPU1, eip c026a8d4, registers:
CPU: 1
EIP: 0060:[<c026a8d4>] Not tainted VLI
EFLAGS: 00001482 (2.6.11)
EIP is at fib_sync_down+0x74/0x140
eax: c11c42f0 ebx: 00001920 ecx: c11c42f8 edx: c11c42f8
esi: f69327fc edi: f742d034 ebp: f69327ec esp: c032ada8
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c032a000 task=c191a530)
Stack: f7c07934 c1919334 c1919349 f75c1020 f7b94c80 c024440c 00000004 00000304
00000000 c02281ee b6a02450 c11c4241 b6a02174 b6a02450 c11c4241 b6a02174
00000003 00000001 f69327ec c11c42f8 00000002 00000003 00000001 f69327b4
Call Trace:
[<c024440c>] ip_finish_output2+0xac/0x1b0
[<c02281ee>] skb_checksum+0x4e/0x2a0
[<c026f863>] tcp_packet+0xe3/0x430
[<c0244360>] ip_finish_output2+0x0/0x1b0
[<c0240920>] ip_forward_finish+0x0/0x50
[<c026d682>] __ip_conntrack_find+0x12/0xe0
[<c026df2e>] ip_conntrack_in+0xde/0x290
[<c02360d2>] nf_iterate+0x72/0xb0
[<c023f510>] ip_rcv_finish+0x0/0x240
[<c023f510>] ip_rcv_finish+0x0/0x240
[<c0236388>] nf_hook_slow+0x68/0xf0
[<c023f510>] ip_rcv_finish+0x0/0x240
[<c023f2f2>] ip_rcv+0x3c2/0x480
[<c023f510>] ip_rcv_finish+0x0/0x240
[<c0226d02>] alloc_skb+0x32/0xd0
[<c022cd5a>] netif_receive_skb+0x13a/0x1a0
[<c01f50ae>] e1000_clean_rx_irq+0x16e/0x4c0
[<c010ff37>] try_to_wake_up+0x237/0x260
[<c01f4e6e>] e1000_clean_tx_irq+0x14e/0x220
[<c01f4c72>] e1000_clean+0x42/0xf0
[<c022cf5f>] net_rx_action+0x7f/0x110
[<c0119a16>] __do_softirq+0xb6/0xd0
[<c01047aa>] do_softirq+0x4a/0x60
=======================
[<c010469d>] do_IRQ+0x4d/0x70
[<c0102d92>] common_interrupt+0x1a/0x20
[<c01004e3>] default_idle+0x23/0x30
[<c010058f>] cpu_idle+0x5f/0x70
Code: 31 ca 48 21 c2 8b 0c 96 85 c9 74 2b 8d 74 26 00 8d bc 27 00 00 00 00 8b 11 0f 18 02 90 8d 41 f8 39 58 24 0f 84 b8 00 00 00 85 d2 <89> d1 75 e8 90 8d b4 26 00 00 00 00 85 ff 74 47 c7 04 24 00 00
console shuts up ...

Disassembly of fib_sync_down shows it is actually locking in the
inlined prefetch function (offsets do not match the registers
shown in the panic as I rebuilt vmlinux with debugging enabled)

unsigned int hash = fib_laddr_hashfn(local);
struct hlist_head *head = &fib_info_laddrhash[hash];
struct hlist_node *node;
struct fib_info *fi;

hlist_for_each_entry(fi, node, head, fib_lhash) {
c026a82e: 8b 0c 96 mov (%esi,%edx,4),%ecx
c026a831: 85 c9 test %ecx,%ecx
c026a833: 74 2b je c026a860 <fib_sync_down+0x80>
c026a835: 8d 74 26 00 lea 0x0(%esi),%esi
c026a839: 8d bc 27 00 00 00 00 lea 0x0(%edi),%edi
However we don't do prefetches for pre XP Athlons currently
That should be fixed. */
#define ARCH_HAS_PREFETCH
extern inline void prefetch(const void *x)
{
c026a840: 8b 11 mov (%ecx),%edx
c026a842: 8d 74 26 00 lea 0x0(%esi),%esi <===== die


Reverting to 2.6.10 makes the problem go away. Ideas?

Phil

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html