Re: [PATCH v3 2/2] Make hard lockup detection use timestamps

From: Don Zickus
Date: Fri Jul 29 2011 - 16:55:52 EST


On Thu, Jul 28, 2011 at 05:16:00PM -0700, ZAK Magnus wrote:
> No news?
>
> I've been testing and looking into issues and I realized dump_stack()
> calls touch_nmi_watchdog(). That wrecks what the patch is trying to do
> so I'm changing it to save the trace and print it later after the
> stall has completed. This would also resolve some other things you
> were saying weren't so good. Hopefully the logic is similar enough
> that some things you may have learned still apply.

So yeah, the acting of printing was resesting the softlockup counter and
delaying it forever. In parallel, rcu has its own stall detector that was
going off after a minute or two.

Once I routed the printk to trace_printk and disabled dump_stack,
everything started working as expected.

Now the question is how to avoid shooting ourselves in the foot by
printk'ing a message without resetting the hard/soft lock watchdogs.

I'll have to think about how to do that. If you can come up with any
ideas let me know.

We almost need a quiet dump_stack that dumps to a buffer instead of the
console. But I am not sure that is worth the effort.

Hmm.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/