Re: [PATCH] sched: Support current clocksource handling infallback sched_clock().

From: john stultz
Date: Tue May 26 2009 - 20:22:29 EST


On Wed, 2009-05-27 at 08:44 +0900, Paul Mundt wrote:
> On Tue, May 26, 2009 at 04:25:03PM -0700, john stultz wrote:
> > On Wed, 2009-05-27 at 08:08 +0900, Paul Mundt wrote:
> > > On Tue, May 26, 2009 at 10:17:02PM +0200, Thomas Gleixner wrote:
> > > > On Tue, 26 May 2009, Peter Zijlstra wrote:
> > > > > On Tue, 2009-05-26 at 16:31 +0200, Linus Walleij wrote:
> > > > > > The definition of "rating" from the kerneldoc does not
> > > > > > seem to imply that, it's a subjective measure AFAICT.
> > > >
> > > > Right, there is no rating threshold defined, which allows to deduce
> > > > that. The TSC on x86 which might be unreliable, but usable as
> > > > sched_clock has an initial rating of 300 which can be changed later
> > > > on to 0 when the TSC is unusable as a time of day source. In that
> > > > case clock is replaced by HPET which has a rating > 100 but is
> > > > definitely not a good choice for sched_clock
> > > >
> > > > > > Else you might want an additional criteria, like
> > > > > > cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> > > > > > (however you do that the best way)
> > > > > > so you don't pick something
> > > > > > that isn't substantially faster than the jiffy counter atleast?
> > > >
> > > > What we can do is add another flag to the clocksource e.g.
> > > > CLOCK_SOURCE_USE_FOR_SCHED_CLOCK and check this instead of the
> > > > rating.
> > > >
> > > Ok, so based on this and John's locking concerns, how about something
> > > like this? It doesn't handle the wrapping cases, but I wonder if we
> > > really want to add that amount of logic to sched_clock() in the first
> > > place. Clocksources that wrap frequently could either leave the flag
> > > unset, or do something similar to the TSC code where the cyc2ns shift is
> > > used. If this is something we want to handle generically, then I'll have
> > > a go at generalizing the TSC cyc2ns scaling bits for the next spin.
> >
> >
> > Yea. So this is a little better. There's still a few other issues to
> > consider:
> >
> > 1) What if a clocksource is registered that has the _SCHED_CLOCK bit
> > set, but is not selected for timekeeping due it being unstable like the
> > TSC?
> >
> See, this is what I thought the rating information was useful for, as the
> rating is subsequently dropped if it is not usable. But perhaps it makes
> more sense to just clear the bit at the same time that the rating is
> lowered once it turns out to be unstable.

Yes, if we're dropping a clocksource we should also drop the bit. That
shouldn't be a problem.

The point I was making, is that multiple clocksources may be registered
at one time (TSC, ACPI_PM, etc). But only one is being managed by the
timekeeping code (clock). So there may be the case where the
sched_clock() is different then the timekeeping clock (which is common
on x86).

So I suspect we need a special hook that grabs the best _SCHED_CLOCK
clocksource (as computed at clocksource registration time) and provides
it to the generic sched_clock() interface.


> > 2) Conditionally returning jiffies if the lock is held seems troubling.
> > Might get some crazy values that way.
> >
> What would you recommend instead? We do not want to spin here, and if we
> are in the middle of changing clocksources and returning jiffies anyways,
> then this same issue pops up in the current sched_clock() implementation
> regardless of whether we are testing for lock contention or not.
> Likewise, even if we were to spin, the same situation exists if the new
> clocksource does not have the _SCHED_CLOCK bit set and we have to fall
> back on jiffies anyways, doesn't it?
>
> Put another way, and unless I'm missing something obvious, if we ignore
> my changes to sched_clock(), how is your concern not applicable to case
> where we are changing clocksources and using generic sched_clock() as it
> is today?

Well, Thomas' point that locking isn't necessary, as sched_clock()
doesn't have to be correct, is probably right.

So, I think a get_sched_clocksource() interface would be ideal (if we
want to get academic at a later date, the pointer could be atomically
updated, and we'd keep it valid for some time via an rcu like method).

Additionally, you can set the jiffies clocksource as a _SCHED_CLOCK
clocksource and drop the jiffies fallback code completely.

thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/