Re: [PATCH 3/3] clocksource: fix can not set tsc as clocksourcebug

From: Thomas Gleixner
Date: Thu Jul 04 2013 - 09:04:30 EST


On Thu, 4 Jul 2013, Thomas Gleixner wrote:
> On Thu, 4 Jul 2013, Alex Shi wrote:
> > commit 5d33b883aed81c6fbcd09c6f7c3619eee850a7e2
> > clocksource: Always verify highres capability
> >
> > This commit will reject a clock to be system clocksource if it has no
> > CLOCK_SOURCE_VALID_FOR_HRES flags. Then the tsc to be rejected as
>
> No. It rejects the clocksource if the system is in oneshot mode and
> the clocksource does not support HIGHRES.
>
> So at boot time, the tick device mode is PERIODIC and the clocksource
> is set to jiffies.
>
> In clocksource_done_booting() we select the best clocksource from the
> registered list. We are still in PERIODIC mode, so the selection in
> clocksource_find_best() should grab TSC whether the HIGHRES_VALID flag
> is set or not. The relevant check in find_best() is:
>
> if (oneshot && !(cs->flags & CLOCK_SOURCE_VALID_FOR_HRES))
> continue;
>
> And oneshot should be definitely false at that point. So we don't care
> about the HRES flag at all.
>
> So if we select TSC from clocksource_done_booting() that prevents the
> system from switching into highres mode as long as the VALID_FOR_HRES
> flag is not set by the watchdog. If it gets set then
> tick_clock_notify() tells the tick code to reevaluate.
>
> So now the question is why you are observing that HPET is selected in
> the first place.
>
> Can you add instrumentation to the code and provide the data please? I
> try to reproduce myself. What's the environment you're using?

Ok. Figured it out.

Boot.
start tsc revalidation()
register(hpet)
register(pmtimer);

clocksource_done_booting()
select HPET

switch to oneshot

tsc_revalidation()
register(TSC)
clocksource_select()
keep HPET because TSC is not yet validated by the watchdog

watchdog validates TSC, but that does not make TSC the system
clocksource.

So, the old code was actually installing TSC over HPET even when TSC
was not yet qualified for HRES. Kinda worked :)

So yes, we need somthing like your patch to fix that.

> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index 9d6c333..3ad9f29 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -308,6 +308,8 @@ static void clocksource_watchdog(unsigned long data)
> > * transition into high-res mode:
> > */
> > tick_clock_notify();
> > + if (finished_booting)
> > + schedule_work(&watchdog_work);
>
> This only makes sense, when the clocksource which gets the FLAG set
> has the highest rating, but was not selected because the system was in
> ONESHOT mode already.
>
> And I can't see why that should suddenly happen.

Now I can :)

> > static int clocksource_watchdog_kthread(void *data)
> > {
> > struct clocksource *cs, *tmp;
> > @@ -412,11 +415,14 @@ static int clocksource_watchdog_kthread(void *data)
> >
> > mutex_lock(&clocksource_mutex);
> > spin_lock_irqsave(&watchdog_lock, flags);
> > - list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list)
> > + list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
> > if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
> > list_del_init(&cs->wd_list);
> > list_add(&cs->wd_list, &unstable);
> > }
> > + if (cs->flags & CLOCK_SOURCE_VALID_FOR_HRES)
> > + clocksource_select();
>
> Unlikely, but if we have 3 watched clocksources which have the HRES
> bit set, you'd call 3 times clocksource_select().

Also the reselect must be called outside the watchdog_lock region.

Thanks,

tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/