Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009

From: Linas Vepstas
Date: Mon Jan 05 2009 - 23:35:45 EST


2009/1/5 john stultz-lkml <johnstul.lkml@xxxxxxxxx>:
> On Fri, Jan 2, 2009 at 4:21 PM, Chris Adams <cmadams@xxxxxxxxxx> wrote:
>> Basically (to my untrained eye), the leap second code is called from the
>> timer interrupt handler, which holds xtime_lock. The leap second code
>> does a printk to notify about the leap second. The printk code tries to
>> wake up klogd (I assume to prioritize kernel messages), and (under some
>> conditions), the scheduler attempts to get the current time, which tries
>> to get xtime_lock => deadlock.
>
> This analysis looks correct to me.
>
> Grrrr. This has bit us a few times since the "no printk while holding
> the xtime lock" restriction was added.
>
> Thomas: Do you think this warrents adding a check to the printk path
> to make sure the xtime lock isn't held?

No.

> This way we can at least get a
> warning when someone accidentally adds a printk or calls a function
> that does while holding the xtime_lock.

This seems like a basic mistake, that should be avoidable
with code review. I'm sort-of surprised to even see it; anyone
even vaguely familiar with that code would spot it quickly.
Heh. Take that with a grain of salt -- not like I never make
mistakes ;-/

I mean, how many more times can the mistake be made?
I'm arguing its gonna be zero.

--linas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/