Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009

From: john stultz-lkml
Date: Mon Jan 05 2009 - 21:21:47 EST


On Fri, Jan 2, 2009 at 4:21 PM, Chris Adams <cmadams@xxxxxxxxxx> wrote:
> Once upon a time, Linas Vepstas <linasvepstas@xxxxxxxxx> said:
>> Below follows a summary of the reported crashes. I'm ignoring the
>> zillions of "mine didn't crash" reports, or the "you're a paranoid
>> conspiracy theorist, its random chance" reports.
>
> I have reproduced this and got a stack trace (this is with Fedora 8 and
> kernel kernel-2.6.26.6-49.fc8.x86_64):
>
[snip]
> Basically (to my untrained eye), the leap second code is called from the
> timer interrupt handler, which holds xtime_lock. The leap second code
> does a printk to notify about the leap second. The printk code tries to
> wake up klogd (I assume to prioritize kernel messages), and (under some
> conditions), the scheduler attempts to get the current time, which tries
> to get xtime_lock => deadlock.

This analysis looks correct to me.

Grrrr. This has bit us a few times since the "no printk while holding
the xtime lock" restriction was added.

Thomas: Do you think this warrents adding a check to the printk path
to make sure the xtime lock isn't held? This way we can at least get a
warning when someone accidentally adds a printk or calls a function
that does while holding the xtime_lock.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/