Re: next-20090107: WARNING: at kernel/sched.c:4435sub_preempt_count

From: Ingo Molnar
Date: Mon Jan 26 2009 - 10:09:35 EST



* Alexey Zaytsev <alexey.zaytsev@xxxxxxxxx> wrote:

> On Mon, Jan 26, 2009 at 17:43, Ingo Molnar <mingo@xxxxxxx> wrote:
> >
> > * Alexey Zaytsev <alexey.zaytsev@xxxxxxxxx> wrote:
> >
> >> On Wed, Jan 14, 2009 at 05:00, Nick Piggin <npiggin@xxxxxxx> wrote:
> >> > On Sun, Jan 11, 2009 at 03:49:45AM +0100, Ingo Molnar wrote:
> >> >>
> >> >> * Alexey Zaytsev <alexey.zaytsev@xxxxxxxxx> wrote:
> >> >>
> >> >> > One more instance of http://marc.info/?l=linux-kernel&m=123134586202636&w=2
> >> >> > Added Ingo Molnar to CC.
> >> >>
> >> >> added Nick on Cc:. Nick, it's about:
> >> >>
> >> >> > commit 7317d7b87edb41a9135e30be1ec3f7ef817c53dd
> >> >> > Author: Nick Piggin <nickpiggin@xxxxxxxxxxxx>
> >> >> > Date: Tue Sep 30 20:50:27 2008 +1000
> >> >> >
> >> >> > sched: improve preempt debugging
> >> >>
> >> >> causing a seemingly spurious warning.
> >> >
> >> > I don't know how it is spurious... Presumably the sequence _would_ have
> >> > caused preempt count to go negative if the bkl were not held...
> >> >
> >> > __do_softirq does a __local_bh_disable on entry, and it seems like the
> >> > _local_bh_enable on exit is what causes this warning. So something is
> >> > unbalanced somehow. Or is it some weird thing we do in early boot that
> >> > I am missing?
> >> >
> >> > Can you put in some printks around these functions in early boot to
> >> > get an idea of what preempt_count is doing?
> >> >
> >>
> >> Hi again.
> >>
> >> Finally got to debug this. The preempt count on the first __do_softirq entry
> >> ever is 0, as it is set in irq_ctx_init(). The interrupted swapper
> >> thread happens
> >> to be in the kernel_locked() state at the moment, so the warning.
> >>
> >> I don't understand why the softirq preempt count is initialized to 0.
> >> Should not it be SOFTIRQ_OFFSET instead?
> >
> > hm, indeed. So this triggers on irqstacks, if an irq happens to hit
> > the first time a softirq executes (ever)? After that point the
> > preempt_count in the irq-stack ought to stay elevated.
>
> No, this happens on the first softirq, which is run after an irq. An irq
> interrupts the swapper thread while it is holding the blk. It is
> executed on the hard irq stack, and the corresponding
> thread_info.preempt_count is set correctly by irq_ctx_init(), so nothing
> happens. After the hard IRQ is over, a softirq is run on the soft irq
> stack, but irq_ctx_init() set it's preempt_count to zero. So after the
> first softirq os over, sub_preempt_count() discovers that the preempt
> count is goind back to zero, while the BKL is held (by the interrupted
> thread), and refuses to decrease the count. So the spftirq preempt_count
> stays SOFTIRQ_OFFSET which is now correct, so no further warnings are
> triggered.

yeah. So we need to fix the initial softirq-stack preempt_count value.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/