Re: 2.6.17-rc5-mm1

From: Ingo Molnar
Date: Wed May 31 2006 - 10:02:00 EST



* Michal Piotrowski <michal.k.k.piotrowski@xxxxxxxxx> wrote:

> but these two bugs looks similar (both were previously reported). Both
> appears while starting avahi daemon.
>
> http://www.stardust.webpages.pl/files/mm/2.6.17-rc5-mm1/dmesg_1
> http://www.stardust.webpages.pl/files/mm/2.6.17-rc5-mm1/dmesg_2
>
> http://www.stardust.webpages.pl/files/mm/2.6.17-rc5-mm1/latency_trace_1.bz2
> http://www.stardust.webpages.pl/files/mm/2.6.17-rc5-mm1/latency_trace_2.bz2

thanks - these traces made it really easy to spot the problem! The
problem seems to be caused by a pagefault:

<...>-1 0D..1 10648us : check_chain_key (__lockdep_acquire)
<...>-1 0D..1 10649us+: _raw_spin_lock (_spin_lock_irqsave)
<...>-1 0D..1 10651us : do_page_fault (error_code)
<...>-1 0D..1 10652us : trace_hardirqs_off (ret_from_exception)
<...>-1 0D..1 10653us : trace_hardirqs_on (restore_nocheck)
<...>-1 0D..1 10654us : mark_held_locks (trace_hardirqs_on)
<...>-1 0D..1 10654us : mark_lock (mark_held_locks)
<...>-1 0D..1 10655us : save_trace (mark_lock)
<...>-1 0D..1 10656us : save_stack_trace (save_trace)
<...>-1 0D..1 10658us : print_usage_bug (mark_lock)

i think what happened is that the pagefault happened with irqs disabled,
and the entry.S return-to-exception-site irq-flags tracing code
mistakenly turned on the irq flag - causing the mismatch and lockdep's
confusion.

if it's easy to reproduce it once more, could you apply the patch below?
That will add a trace entry about what address faulted and at what EIP.
Please also upload vmlinux.bz2 because the EIP will be a raw hex number
and i'll have to look it up. (or if it's too big then please disassemble
vmlinux via objdump -d vmlinux and upload a ~100 lines portion that is
mentioned in the new trace entry next to the do_page_fault trace entry
near the end of the latency_trace output)

Ingo

Index: linux/arch/i386/mm/fault.c
===================================================================
--- linux.orig/arch/i386/mm/fault.c
+++ linux/arch/i386/mm/fault.c
@@ -337,6 +338,7 @@ fastcall void __kprobes do_page_fault(st

/* get the address */
address = read_cr2();
+ trace_special(regs->eip, address, error_code);

tsk = current;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/