Re: [PATCH 08/10] -mm clocksource: cleanup on -mm

From: Daniel Walker
Date: Fri Aug 04 2006 - 19:14:35 EST

On Fri, 2006-08-04 at 15:16 -0700, john stultz wrote:

> > I imagine the users of the interface would be compartmentalized. Taking
> > sched_clock as an example the output is only compared to itself and not
> > to output from other interfaces.
> Agreed on both points. Although I suspect this point will need to be
> made explicit.

Yeah, that's a good idea.

> > So if you correctly implement a clocksource structure for
> > your hardware you will at least expose a usable sched_clock() and
> > generic timeofday. Then if we add more users of the interface then more
> > functionality is exposed.
> Well, this point might need some work. sched_clock has quite a different
> correctness/performance tradeoff when compared against timeofday. If one
> correctly implements a clocksource for something like the ACPI PM, I
> doubt they'd want to use it for sched_clock (due to its ~1us access
> latency). Additionally, since sched_clock doesn't require (for its
> original purpose, at least) the TSC synchronization that is essential
> for timekeeping, how will sched_clock determine which clocksource to use
> on a system were the TSC is unsyched and marked bad?

sched_clock would use the highest rated clock in the system, and if that
becomes unstable it uses jiffies. That could mean using the acpi_pm if
it's the highest rated clock. Some code would have the be added to force
sched_clock to use the tsc.

I did some hackbench runs using the tsc vs. acpi_pm, and there was only
minimal differences (within the margin of error). It could be different
with other pm timers, but that was the result on mine. I also did some
tests of tsc vs. pit which showed some extensive differences. It added
10 seconds to "hackbench 80" . So I'm not entirely convinced that
acpi_pm is totally inappropriate as a fall back, in the case of
un-synced TSC.

I wish I had an HPET to test.

> > Another instances of this is when instrumentation is needing a of fast
> > low level timestamp. In the past to accomplish this one would need a per
> > arch change to read a clock, then potentially duplicate a shift and mult
> > type computation in order to covert to nanosecond. One good example of
> > this is latency tracing in the -rt tree. I can imagine some good and
> > valid instrumentation having a long road of acceptable because the time
> > stamping portion would need to flow through several different arch and
> > potentially board maintainers.
> This sounds reasonable, but also I'd question if sched_clock or
> get_cycles would be appropriate here. Further, if the mult/shift cost is
> acceptable, why not just use the timeofday as the cost will be similar.

get_cycles() isn't implemented on all arches. sched_clock() sometimes
returns jiffies converted to nanosecond depending on the arch (it does
this sometimes on i386 even). Also sched_clock() has the disadvantage of
converting to nanosecond each time it runs, which isn't always ideal.
get_cycles(), if it's implemented, doesn't come with a standard way to
find a) the clock it accesses b) the frequency of the clock.

So they both have disadvantages over the clocksource interface.

> > I've also imagined that some usage of jiffies could be converted to use
> > this interface if it was appropriate. Since jiffies is hooked to the
> > tick, and the tick is getting more and more irregular, a clocksource
> > might be a relatively good replacement.
> Hmmm. That'd be a harder sell for me. Probably would want those users to
> move to the timeofday, or alternatively, drive jiffies off of the
> timekeeping code rather then the interrupt handler to ensure it stays
> synced (something I'm plotting once the timekeeping code settles down).

It's case by case. I wouldn't say all jiffies uses could use timeofday
calls, and I wouldn't say they could all use a clocksource. I'd imagine
some could be converted to a clocksource though.

I'd be interested to see any jiffies changes you make.

> > > I do feel making the abstraction clean and generic is a good thing just
> > > for code readability (and I very much appreciate your work here!), but
> > > I'm not really sure that the need for clocksource access outside the
> > > timekeeping subsystem has been well expressed. Do you have some other
> > > examples other then sched_clock that might show further uses for this
> > > abstraction?
> >
> > I've converted latency tracing to an earlier version of the API , but I
> > don't have any other examples prepared. I think it's important to get
> > the API settled before I start converting anything else.
> Again, I think your patch set looks good for the most part (its just the
> last few bits I worry about). I'm very much interested to see where you
> go with this, as I feel sched_clock (on i386 atleast) needs some love
> and attention and I'm excited to see new uses for the clocksource
> abstraction. However, I do want to make sure that we think the use cases
> out to avoid over-engineering the wrong bits.

Your questions are certainly appropriate, and I appreciate the review.
There's not to many other responses so far.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at