Re: [PATCH v2] clocksource: document some basic timekeeping concepts

From: Peter Zijlstra
Date: Tue Jun 24 2014 - 06:38:02 EST


On Tue, Jun 24, 2014 at 10:51:12AM +0200, Linus Walleij wrote:
> +Clock events
> +------------
> +
> +Clock events are conceptually orthogonal to clock sources. The same hardware
> +and register range may be used for the clock event, but it is essentially
> +a different thing. The hardware driving clock events have to be able to
> +fire interrupts, so as to trigger events on the system timeline. On a SMP
> +system, it is ideal (and custom) to have one such event driving timer per

customary?

> +CPU core, so that each core can trigger events independently of any other
> +core.
> +
> +You will notice that the clock event device code is based on the same basic
> +idea about translating counters to nanoseconds using mult and shift
> +arithmetics, and you find the same family of helper functions again for
> +assigning these values. The clock event driver does not need a 'mask'
> +attribute however: the system will not try to plan events beyond the time
> +horizon of the clock event.
> +
> +
> +sched_clock()
> +-------------
> +
> +In addition to the clock sources and clock events there is a special weak
> +function in the kernel called sched_clock(). This function shall return the
> +number of nanoseconds since the system was started.

Strictly speaking the scheduler doesn't care about the 0 offset; but as
you mention below, printk() uses this time and people tend to notice and
complain if its not 0 at boot.

> An architecture may or
> +may not provide an implementation of sched_clock() on its own. If a local
> +implementation is not provided, the system jiffy counter will be used as
> +sched_clock().
> +
> +As the name suggests, sched_clock() is used for scheduling the system,
> +determining the absolute timeslice for a certain process in the CFS scheduler
> +for example. It is also used for printk timestamps when you have selected to
> +include time information in printk for things like bootcharts.
> +
> +Compared to clock sources, sched_clock() has to be very fast: it is called
> +much more often, especially by the scheduler. If you have to do trade-offs
> +between accuracy compared to the clock source, you may sacrifice accuracy
> +for speed in sched_clock(). It however require some of the same basic
> +characteristics as the clock source, i.e. it has to be monotonic.

We can deal with the occasional weirdness; but yes, we very much prefer
a strictly monotonic clock.

> +The sched_clock() function may wrap only on unsigned long long boundaries,
> +i.e. after 64 bits. Since this is a nanosecond value this will mean it wraps
> +after circa 585 years. (For most practical systems this means "never".)
> +
> +If an architecture does not provide its own implementation of this function,
> +it will fall back to using jiffies, making its maximum resolution 1/HZ of the
> +jiffy frequency for the architecture. This will affect scheduling accuracy
> +and will likely show up in system benchmarks.
> +
> +The clock driving sched_clock() may stop or reset to zero during system
> +suspend/sleep. This does not matter to the function it serves of scheduling
> +events on the system. However it may result in interesting timestamps in
> +printk().

Right, on x86 we explicitly save/restore the offset to compensate for
this.

> +The sched_clock() function should be callable in any context, IRQ- and
> +NMI-safe and return a sane value in any context.
> +
> +Some architectures may have a limited set of time sources and lack a nice
> +counter to derive a 64-bit nanosecond value, so for example on the ARM
> +architecture, special helper functions have been created to provide a
> +sched_clock() nanosecond base from a 16- or 32-bit counter. Sometimes the
> +same counter that is also used as clock source is used for this purpose.
> +
> +On SMP systems, it is crucial for performance that sched_clock() can be called
> +independently on each CPU without any synchronization performance hits.
> +Some hardware (such as the x86 TSC) will cause the sched_clock() function to
> +drift between the CPUs on the system. The kernel can work around this by
> +enabling the CONFIG_HAVE_UNSTABLE_SCHED_CLOCK option. This is another aspect
> +that makes sched_clock() different from the ordinary clock source.


Other than that this version does look good.

Thanks for doing this.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/