Re: [PATCH 08/10] -mm clocksource: cleanup on -mm

From: Daniel Walker
Date: Fri Aug 04 2006 - 17:08:57 EST


On Fri, 2006-08-04 at 12:53 -0700, john stultz wrote:
>
> Hmmmm. Yea, some additional discussion here would probably be needed
>
> At the moment, I'd prefer to keep the clocksource_adjust bits with the
> timekeeping code, however I'd also prefer to remove the timekeeping
> specific fields (cycle_last, cycle_interval, xtime_nsec, xtime_interval,
> error) from the clocksource structure and instead keep them in a
> timekeeping specific structure (which may also point to a clocksource).
>
> This would keep a clean separation between the clocksource's abstraction
> that keeps as little state as possible and the timekeeping code's
> internal state. However the point you bring up above is an interesting
> issue: Do all users of the generic clocksource structure want the
> clocksource to be NTP adjusted?

Since the output from the clocksource is a lowlevel timestamp I don't
think the users of it would want it to be ntp adjusted. It would also be
a little odd, since the ntp adjustment would be attached only to a
single clock.

> If we allow for non-ntp adjusted access to the clocksources, we may have
> consistency issues between users comparing say sched_clock() and
> clock_gettime() intervals. Further, if those users do want NTP adjusted
> counters, why aren't they just using the timekeeping subsystem?

I imagine the users of the interface would be compartmentalized. Taking
sched_clock as an example the output is only compared to itself and not
to output from other interfaces.

> This does put some question as to what exactly would be the uses of the
> clocksource structure outside of the timekeeping realm. Sure,
> sched_clock() is a reasonable example, although since sched_clock has
> such specific latency needs (we probably shouldn't go touching off-chip
> hardware on every sched_clock call) and can be careful to avoid TSC skew
> unlike the timekeeping code, its selection algorithm is going to be very
> arch specific. So I'm not sure its really an ideal use of the
> clocksource interface (as its not too difficult to just keep sched_clock
> arch specific).

Part of the reason to have a generic sched_clock() (and the generic
clocksource interface in general) is to eliminate the inefficienty of
duplicating shift and mult functionality in each arch (and on ARM it's
per board). So if you correctly implement a clocksource structure for
your hardware you will at least expose a usable sched_clock() and
generic timeofday. Then if we add more users of the interface then more
functionality is exposed.

Another instances of this is when instrumentation is needing a of fast
low level timestamp. In the past to accomplish this one would need a per
arch change to read a clock, then potentially duplicate a shift and mult
type computation in order to covert to nanosecond. One good example of
this is latency tracing in the -rt tree. I can imagine some good and
valid instrumentation having a long road of acceptable because the time
stamping portion would need to flow through several different arch and
potentially board maintainers.

I've also imagined that some usage of jiffies could be converted to use
this interface if it was appropriate. Since jiffies is hooked to the
tick, and the tick is getting more and more irregular, a clocksource
might be a relatively good replacement.

> I do feel making the abstraction clean and generic is a good thing just
> for code readability (and I very much appreciate your work here!), but
> I'm not really sure that the need for clocksource access outside the
> timekeeping subsystem has been well expressed. Do you have some other
> examples other then sched_clock that might show further uses for this
> abstraction?

I've converted latency tracing to an earlier version of the API , but I
don't have any other examples prepared. I think it's important to get
the API settled before I start converting anything else.

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/