[PATCH] clocksource: document some basic concepts
From: Linus Walleij
Date: Mon Nov 15 2010 - 05:35:35 EST
This adds some documentation about clock sources and the weak
sched_clock() function that answers questions that repeatedly
arise on the mailing lists.
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Nicolas Pitre <nico@xxxxxxxxxxx>
Cc: Colin Cross <ccross@xxxxxxxxxx>
Cc: John Stultz <johnstul@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Rabin Vincent <rabin.vincent@xxxxxxxxxxxxxx>
Signed-off-by: Linus Walleij <linus.walleij@xxxxxxxxxxxxxx>
---
Documentation/timers/00-INDEX | 2 +
Documentation/timers/clocksource.txt | 106 ++++++++++++++++++++++++++++++++++
2 files changed, 108 insertions(+), 0 deletions(-)
create mode 100644 Documentation/timers/clocksource.txt
diff --git a/Documentation/timers/00-INDEX b/Documentation/timers/00-INDEX
index a9248da..fb88065 100644
--- a/Documentation/timers/00-INDEX
+++ b/Documentation/timers/00-INDEX
@@ -1,5 +1,7 @@
00-INDEX
- this file
+clocksource.txt
+ - Clock sources and sched_clock() notes
highres.txt
- High resolution timers and dynamic ticks design notes
hpet.txt
diff --git a/Documentation/timers/clocksource.txt b/Documentation/timers/clocksource.txt
new file mode 100644
index 0000000..cf4ab9e
--- /dev/null
+++ b/Documentation/timers/clocksource.txt
@@ -0,0 +1,106 @@
+Clock sources and sched_clock()
+-------------------------------
+
+If you grep through the kernel source you will find a number of architecture-
+specific implementations of clock sources and several likewise architecture-
+specific overrides of the sched_clock() function.
+
+To provide timekeeping for your platform, the clock source provides
+the basic timeline, whereas clock events shoot interrupts on certain points
+on this timeline, providing facilities such as high-resolution timers.
+sched_clock() is used for scheduling and timestamping.
+
+
+Clock sources
+-------------
+
+The purpose of the clock source is to provide a timeline for the system that
+tells you where you are in time. For example issuing the command 'date' on
+a Linux system will eventually read the clock source to determine exactly
+what time it is.
+
+Typically the clock source is a monotonic, atomic counter which will provide
+n bits which count from 0 to (2^n-1) and then wraps around to 0 and start over.
+
+The clock source shall have as high resolution as possible, and shall be as
+stable and correct as possible as compared to a real-world wall clock. It
+should not move unpredictably back and forth in time or miss a few cycles
+here and there.
+
+It must be immune the kind of effects that occur in hardware where e.g. the
+counter register is read in two phases on the bus lowest 16 bits first and
+the higher 16 bits in a second bus cycle with the counter bits potentially
+being updated inbetween leading to the risk of very strange values from the
+counter.
+
+When the wall-clock accuracy of the clock source isn't satisfactory, there
+are various quirks and layers in the timekeeping code for e.g. synchronizing
+the user-visible time to RTC clocks in the system or against networked time
+servers using NTP, but all they do is basically to update an offset against
+the clock source, which provides the fundamental timeline for the system.
+These measures does not affect the clock source per se.
+
+The clock source struct shall provide means to translate the provided counter
+into a rough nanosecond value as an unsigned long long (unsigned 64 bit) number.
+Since this operation may be invoked very often doing this in a strict
+mathematical sense is not desireable: instead the number is taken as close as
+possible to a nanosecond value using only the arithmetic operations
+mult and shift, so in clocksource_cyc2ns() you find:
+
+ ns ~= (clocksource * mult) >> shift
+
+You will find a number of helper functions in the clock source code intended
+to aid in providing these mult and shift values, such as
+clocksource_khz2mult(), clocksource_hz2mult() that help determinining the
+mult factor from a fixed shift, and clocksource_calc_mult_shift() and
+clocksource_register_hz() which will help out assigning both shift and mult
+factors using the frequency of the clock source and desirable minimum idle
+time as the only input. In the past, the timekeeping authors would come up with
+these values by hand, which is why you will sometimes find hard-coded shift
+and mult values in the code.
+
+Since a 32 bit counter at say 100 MHz will wrap around to zero after some 43
+seconds, the code handling the clock source will have to compensate for this.
+That is the reason to why the clock source struct also contains a 'mask'
+member telling how many bits of the source are valid. This way the timekeeping
+code knows when the counter will wrap around and can insert the necessary
+compensation code on both sides of the wrap point so that the system timeline
+remains monotonic. Note that the clocksource_cyc2ns() function will not
+compensate for wrap-arounds: it will return the rough number of nanoseconds
+since the last wrap-around.
+
+You will notice that the clock event device code is based on the same basic
+idea about translating counters to nanoseconds using mult and shift
+arithmetics, and you find the same family of helper functions again for
+assigning these values. The clock event driver does not need a 'mask'
+attribute however: the system will not try to plan events beyond the time
+horizon of the clock event.
+
+
+sched_clock()
+-------------
+
+In addition to the clock sources and clock events there is a special weak
+function in the kernel called sched_clock(). This function shall return the
+number of nanoseconds since the system was started. An architecture may or
+may not provide an implementation of sched_clock() on its own.
+
+As the name suggests, sched_clock() is used for scheduling the system,
+determining the absolute timeslice for a certain process in the CFS scheduler
+for example. It is also used for printk timestamps when you have selected to
+include time information in printk for things like bootcharts.
+
+Compared to clock sources, sched_clock() has to be very fast: it is called
+much more often, especially by the scheduler. If you have to do trade-offs
+between accuracy compared to the clock source, you may sacrifice accuracy
+for speed in sched_clock(). It however require the same basic characteristics
+as the clock source, i.e. it has to be monotonic.
+
+The sched_clock() function may wrap only on unsigned long long boundaries,
+i.e. after 64 bits. Since this is a nanosecond value this will mean it wraps
+after circa 585 years. (For most practical systems this means "never".)
+
+If an architecture does not provide its own implementation of this function,
+it will fall back to using jiffies, making its maximum resolution 1/HZ of the
+jiffy frequency for the architecture. This will affect scheduling accuracy
+and will likely show up in system benchmarks.
--
1.6.3.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/