BUG: spinlock lockup 2.6.18

From: Jamie Iles
Date: Fri May 09 2008 - 07:37:39 EST


Hi,

I am trying to debug an ARM926 system running 2.6.18 kernel with a custom Ethernet driver. When we load the CPU near to 100%, we see a spinlock lockup bug. There are two functions in the driver code that access the spinlock that we are seeing the error from - emac_tx() and emac_tx_interrupt_work(). emac_tx() is the dev_hard_start_xmit() implementation and emac_tx_interrupt_work() is a worker thread to free transmitted socket buffers.

Both emac_tx() and emac_tx_interrupt_work() do an spin_lock_irqsave() at the beginning and then spin_lock_irq_restore() at the end on the same spinlock. There are no code paths that I can find that won't unlock the spinlock.

In emac_tx_interrupt_work(), we free transmitted socket buffers with dev_kfree_skb() - this is not in a interrupt context so I think this is the correct flavour to use. This eventually calls local_bh_enable_ip() and then do_softirq().

Why are softirq's being run when we have explicitly disabled interrupts with spin_lock_irqsave? We appear to get a NET_RX_SOFTIRQ softirq which ends up processing the network stack backlog and calling emac_tx() which attempts to lock the spin_lock_t that we have locked in emac_tx_interrupt_work. Is this likely to be a kernel bug?

Thanks,

Jamie

BUG: spinlock lockup on CPU#0, EMAC TX/423, c1d76d68 [<c001f998>] (dump_stack+0x0/0x14) from [<c00e3400>] (_raw_spin_lock+0x9c/0x178) [<c00e3364>] (_raw_spin_lock+0x0/0x178) from [<c0194f10>] (_spin_lock_irqsave+0x 34/0x3c) [<c0194edc>] (_spin_lock_irqsave+0x0/0x3c) from [<bf05fba4>] (emac_tx+0x2c/0x144 [firecracker_emac]) r4 = C1D24C00 [<bf05fb78>] (emac_tx+0x0/0x144 [firecracker_emac]) from [<c012d4a8>] (dev_hard_ start_xmit+0x110/0x2b0) [<c012d398>] (dev_hard_start_xmit+0x0/0x2b0) from [<c013b844>] (__qdisc_run+0xa4 /0x1f4) [<c013b7a0>] (__qdisc_run+0x0/0x1f4) from [<c012fb7c>] (dev_queue_xmit+0x174/0x2 94) r8 = C1D4C000 r7 = C1D24D18 r6 = C1D24C00 r5 = C0841BC0 r4 = 00000000 [<c012fa08>] (dev_queue_xmit+0x0/0x294) from [<c0152194>] (ip_output+0x174/0x2c0 ) r7 = C0437D80 r6 = C12EEB20 r5 = C0841BC0 r4 = C12EEB34 [<c0152020>] (ip_output+0x0/0x2c0) from [<c014d7f4>] (ip_forward+0x214/0x314) r7 = C1524800 r6 = C0841BF8 r5 = C0437D80 r4 = C0841BC0 [<c014d5e0>] (ip_forward+0x0/0x314) from [<c014bc08>] (ip_rcv+0x2f0/0x5b4) r6 = C021C440 r5 = C0841BC0 r4 = C1D6E810 [<c014b918>] (ip_rcv+0x0/0x5b4) from [<c012d10c>] (netif_receive_skb+0x2a4/0x3f4 ) r7 = 00000008 r6 = C021A9D4 r5 = C0841BC0 r4 = C021A9BC [<c012ce68>] (netif_receive_skb+0x0/0x3f4) from [<c012efa8>] (process_backlog+0x a4/0x188) [<c012ef04>] (process_backlog+0x0/0x188) from [<c012f120>] (net_rx_action+0x94/0 x174) [<c012f08c>] (net_rx_action+0x0/0x174) from [<c003a900>] (__do_softirq+0x70/0xe0 ) [<c003a890>] (__do_softirq+0x0/0xe0) from [<c003a9c4>] (do_softirq+0x54/0x60) [<c003a970>] (do_softirq+0x0/0x60) from [<c003ad3c>] (local_bh_enable_ip+0x68/0x 70) r4 = C1D4C000 [<c003acd4>] (local_bh_enable_ip+0x0/0x70) from [<c019520c>] (_write_unlock_bh+0 x34/0x38) r4 = BF0C10DC [<c01951d8>] (_write_unlock_bh+0x0/0x38) from [<bf0c10dc>] (ip_nat_cleanup_connt rack+0x54/0x64 [ip_nat]) r4 = C1BCBD68 [<bf0c1088>] (ip_nat_cleanup_conntrack+0x0/0x64 [ip_nat]) from [<bf0b5460>] (des troy_conntrack+0x6c/0x148 [ip_conntrack]) r4 = C1BCBD68 [<bf0b53f4>] (destroy_conntrack+0x0/0x148 [ip_conntrack]) from [<c012772c>] (__k free_skb+0xf0/0x164) r4 = C4A751C0 [<c012763c>] (__kfree_skb+0x0/0x164) from [<c01277c4>] (kfree_skb+0x24/0x50) r5 = FFD01230 r4 = C1D2E310 [<c01277a0>] (kfree_skb+0x0/0x50) from [<bf05e480>] (emac_free_skb_frag+0x30/0x5 c [firecracker_emac]) [<bf05e450>] (emac_free_skb_frag+0x0/0x5c [firecracker_emac]) from [<bf05f130>] (emac_tx_interrupt_work+0x78/0xfc [firecracker_emac]) r5 = C1D24C00 r4 = C1D76D40 [<bf05f0b8>] (emac_tx_interrupt_work+0x0/0xfc [firecracker_emac]) from [<c0046f2 4>] (run_workqueue+0xa0/0x124) r8 = BF05F0B8 r7 = C1D24C00 r6 = C1D76EC0 r5 = C1D24F2C r4 = C1D24F28 [<c0046e84>] (run_workqueue+0x0/0x124) from [<c0047868>] (worker_thread+0x130/0x 14c) [<c0047738>] (worker_thread+0x0/0x14c) from [<c004ad64>] (kthread+0xe0/0x114) [<c004ac84>] (kthread+0x0/0x114) from [<c0037e5c>] (do_exit+0x0/0x9e0) r7 = 00000000 r6 = 00000000 r5 = 00000000 r4 = 00000000

This email and any files transmitted with it are confidential and intended solely for the use of the individuals to whom they are addressed. If you have received this email in error please notify the sender and delete the message from your system immediately.

Attachment: firecracker_emac.c.gz
Description: GNU Zip compressed data