Re: 2.2.9-ac2 locks solid

George (greerga@nidhogg.ham.muohio.edu)
Sun, 6 Jun 1999 23:26:42 -0400 (EDT)


On Sun, 6 Jun 1999, Matthew G. Marsh wrote:

>Even weirder - Several times it locked and then "magically" came back.
>During the lock time it would not ping nor was its MAC address seen on the
>network. No magic sysreq keys no nothing.

According to a pair of Oops' I just forced, it is caused by a deadlock
in do_try_to_free_pages.

kswapd does:

>>EIP: c0112547 <smp_flush_tlb+57/90>
Trace: c01278a5 <try_to_swap_out+1b5/1f0>
Trace: c01279f4 <swap_out_vma+114/16c>
Trace: c0127a93 <swap_out_process+47/7c>
Trace: c0127b9e <swap_out+d6/100>
Trace: c0127c4e <do_try_to_free_pages+86/bc>
Trace: c0127d73 <kswapd+6b/fc>
Trace: c0106000 <empty_zero_page+1000/1004>
Trace: c0107399 <kernel_thread+2d/40>

Which holds the kernel lock and spins waiting for the other processor to
acknowledge the TLB IPI. However, the other processor (in SMP) does:

>>EIP: c0196b87 <stext_lock+f67/3e60>
Trace: c0127e37 <try_to_free_pages+33/44>
Trace: c0128683 <__get_free_pages+6b/1d0>
Trace: c010b183 <handle_IRQ_event+5f/94>
Trace: c0126818 <kmem_cache_grow+114/3b8>
Trace: c0113c63 <do_edge_ioapic_IRQ+7f/b0>
Trace: c0126c1a <kmem_cache_alloc+fa/170>
Trace: c01156a0 <send_sig_info+1c0/2f8>
Trace: c0115a8e <kill_something_info+10a/11c>

Which spins on the kernel lock in do_try_to_free_pages() also. Thus, CPU
#1 has the lock and wants the TLB IPI serviced but CPU #2 needs the lock to
continue handling its current interrupt before it can handle the TLB IPI.

Deadlock.

I have forwarded details of the oops to Andrea, SCT, Alan, and Ingo, the
people I had contacted with questions prior to generating the oops and
finding the above pattern.

Of course, I could be totally wrong, but those traces above look mighty
incriminating. :)

-George Greer
(looking forward to a 'stuck on TLB IPI wait' free kernel)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/