Re: BUG during shutdown - bisected to commit e2912009

From: Xiaotian Feng
Date: Wed Jan 06 2010 - 22:24:42 EST


On 01/07/2010 11:20 AM, Marc Dionne wrote:
On Wed, Jan 6, 2010 at 10:07 PM, Marc Dionne<marc.c.dionne@xxxxxxxxx> wrote:
On Wed, Jan 6, 2010 at 9:51 PM, Xiaotian Feng<dfeng@xxxxxxxxxx> wrote:
On 01/07/2010 08:44 AM, Marc Dionne wrote:

On Wed, Jan 6, 2010 at 4:42 AM, Xiaotian Feng<dfeng@xxxxxxxxxx> wrote:

On 01/06/2010 06:58 AM, Marc Dionne wrote:

On Tue, Jan 5, 2010 at 5:18 AM, Xiaotian Feng<dfeng@xxxxxxxxxx>
wrote:

This is outputed by sound module, but it will not affect clockevents,
could
you please try following patch and let me know the output before BUG_ON
happens? We can gather more information on the BUG_ON. Thank you.

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 6f740d9..7c945e8 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -260,6 +260,9 @@ void clockevents_notify(unsigned long reason, void
*arg)
list_for_each_entry_safe(dev, tmp,&clockevent_devices,
list)
{
if (cpumask_test_cpu(cpu, dev->cpumask)&&
cpumask_weight(dev->cpumask) == 1) {
+ if (dev->mode != CLOCK_EVT_MODE_UNUSED)
+ printk("invalid dev %s mode %d
on
cpu %d\n", dev->name,
+ dev->mode, cpu);
BUG_ON(dev->mode !=
CLOCK_EVT_MODE_UNUSED);
list_del(&dev->list);

I don't get anything on screen from the printk - is there a trick
needed to getting printk output at that stage of shutting down? I
tried inserting an mdelay() before the BUG, which delayed the bug
output but still didn't print the invalid dev message.

Did you notice this BUG when you're doing suspend/resume?

Does the BUG still appear if we changed BUG_ON line to BUG_ON(dev->mode
!=
CLOCK_EVT_MODE_UNUSED&& dev->mode != CLOCK_EVT_MODE_SHUTDOWN)?

I only see the BUG on halt - reboot works normally and suspend
actually freezes and doesn't suspend, but that's perhaps unrelated.

I managed to get your suggested printk to work by adding KERN_CRIT
(otherwise I got no output), and the offending dev is:
"hpet", mode 3 (CLOCK_EVT_MODE_ONESHOT?), cpu 4.

It looks like kernel is trying to remove broadcast device, could you please
try following patch?

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 6f740d9..d7395fd 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -259,7 +259,8 @@ void clockevents_notify(unsigned long reason, void *arg)
cpu = *((int *)arg);
list_for_each_entry_safe(dev, tmp,&clockevent_devices, list)
{
if (cpumask_test_cpu(cpu, dev->cpumask)&&
- cpumask_weight(dev->cpumask) == 1) {
+ cpumask_weight(dev->cpumask) == 1&&
+ !tick_is_broadcast_device(dev)) {
BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
list_del(&dev->list);
}

That works - no problem shutting down with that patch applied.

And after doing a bit more testing, it turns out that applying this
patch also makes suspend/resume work normally again, so it looks like
the hang on suspend was also related to this.
Thanks for the testing, patch is sent to upstream :-)

Thanks,
Marc


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/