Re: TSC to Mono-raw Drift

From: John Stultz
Date: Fri Oct 19 2018 - 14:39:28 EST


On Fri, Oct 19, 2018 at 11:34 AM, John Stultz <john.stultz@xxxxxxxxxx> wrote:
> On Fri, Oct 19, 2018 at 8:25 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> Christopher,
>>
>> Please Cc LKML on such issues in the future.
>>
>> On Mon, 15 Oct 2018, Christopher Hall wrote:
>>
>> Leaving context around for new readers:
>>
>>> Problem Statement:
>>>
>>> The TSC clocksource mult/shift values are derived from CPUID[15H], but the
>>> monotonic raw clock value is not equal to TSC in nominal nanoseconds, i.e.
>>> the timekeeping code is not accurately transforming TSC ticks to nominal
>>> nanoseconds based on CPUID[15H}.
>>>
>>> The included code calculates the drift between nominal TSC nanoseconds and
>>> the monotonic raw clock.
>>>
>>> Background:
>>>
>>> Starting with 6th generation Intel CPUs, the TSC is "phase locked" to the
>>> Always Running Timer (ART). The relation between TSC and ART is read from
>>> CPUID[15H]. Details of the TSC-ART relation are in the "Invariant
>>> Timekeeping" section of the SDM.
>>>
>>> CPUID[15H].ECX returns the nominal frequency of ART (or crystal frequency).
>>> CPU feature TSC_KNOWN_FREQ indicates that tsc_khz (tsc.c) is derived from
>>> CPUID[15H]. The calculation is in tsc.c:native_calibrate_tsc().
>>>
>>> When the TSC clocksource is selected, the timekeeping code uses mult/shift
>>> values to transform TSC into nanoseconds. The mult/shift value is determined
>>> using tsc_khz.
>>>
>>> Example Output:
>>>
>>> Running for 3 seconds trial 1
>>> Scaled TSC delta: 3000328845
>>> Monotonic raw delta: 3000329117
>>> Ran for 3 seconds with 272 ns skew
>>>
>>> Running for 3 seconds trial 2
>>> Scaled TSC delta: 3000295209
>>> Monotonic raw delta: 3000295482
>>> Ran for 3 seconds with 273 ns skew
>>>
>>> Running for 3 seconds trial 3
>>> Scaled TSC delta: 3000262870
>>> Monotonic raw delta: 3000263142
>>> Ran for 3 seconds with 272 ns skew
>>>
>>> Running for 300 seconds trial 4
>>> Scaled TSC delta: 300000281725
>>> Monotonic raw delta: 300000308905
>>> Ran for 300 seconds with 27180 ns skew
>>>
>>> The skew between tsc and monotonic raw is about 91 PPB.
>>>
>>> System Information:
>>>
>>> CPU model string: Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
>>> Kernel version tested: 4.14.71-rt44
>>> NOTE: The skew seems to be insensitive to kernel version after
>>> introduction of TSC_KNOWN_FREQ capability
>>>
>>> >From CPUID[15H]:
>>> Time Stamp Counter/Core Crystal Clock Information (0x15):
>>> TSC/clock ratio = 276/2
>>> nominal core crystal clock = 24000000 Hz (table lookup)
>>>
>>> TSC kHz used to calculate mult/shift value: 3312000
>
> So, just to understand, your saying the problem that we calculate a
> tsc_khz value before calculating the mult/shift and the intermediate
> step is losing some precision?
>
> Or is the cause from something else?

The other potential cause here might be just that when we calculate
the mult/shift pair, we select a shift small enough that avoids the
multiplication from overflowing if we have a long timerval. So there
is liklely always some granularity error converting to mult/shift
pair.

thanks
-john