FYI: BUG: deadlock workqueues + OOM

From: Linas Vepstas
Date: Fri Jan 21 2011 - 17:37:52 EST


I've been working on a new arch (patches to be submitted "real soon now")
and have started seeing a deadlock in the workqueues. This email is "FYI",
as I don't have much in the way of good evidence yet, but it seems like an
arch-indep bug so I thought I'd report it :-)

kernel: linux-2.6.37-rc8
system: 768K RAM, 4-way cpu, rootfs on NFS, no local block storage, no swap.

scenario: run a "mempig" that occasionally triggers the OOM killer, while
also running a pthread-create bomb (like fork-bomb but for threads) (but
each thread returns immediately).

Deadlock: two cpu's in idle loop, two cpu's spinning on spinlock in
kernel/workqueue.c, interrupts disabled. A pair of "typical" stack
traces below:

c0288484 _raw_spin_lock_irqsave
c0041224 __queue_work -- kernel/workqueue.c
spin_lock_irqsave(&gcwq->lock, flags);
c0041500 queue_work_on
c025d7d0 xprt_force_disconnect
xs_tcp_write_space
tcp_fin
svc_drop
tcp_rcv_established
... etc.

c02883c4 _raw_spin_lock_irq
c0042d80 start_flush_work kernel/workqueue.c
c00435d0 flush_work
c01aed54 n_tty_read
c01a99cc tty_read
... etc.

The precise stack traces vary, but they always end up with one cpu
in start_flush_work() and the other in __queue_work()

I was wondering if this reminds anyone of anything. I'll provide
more if/when I narrow it down.

--linas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/