Re: [RFC PATCH V2 2/9] perf: Extend ABI to support post-processing monotonic raw conversion

From: Liang, Kan
Date: Mon Mar 13 2023 - 17:20:03 EST




On 2023-03-11 12:55 a.m., John Stultz wrote:
> On Thu, Mar 9, 2023 at 8:56 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>> On 2023-03-08 8:17 p.m., John Stultz wrote:
>>> So I spent a little bit of time today adding some trace_printks to the
>>> timekeeping code so I could record the actual TSC and timestamps being
>>> calculated from CLOCK_MONOTONIC_RAW.
>>>
>>> I did catch one error in the test code, which unfortunately I'm to blame for:
>>> mid = start + (delta +(delta/2))/2; //round-closest
>>>
>>> That should be
>>> mid = start + (delta +(2/2))/2 //round-closest
>>> or more simply
>>> mid = start + (delta +1)/2; //round-closest
>>>
>>> Generalized rounding should be: (value + (DIV/2))/DIV), but I'm
>>> guessing with two as the divisor, my brain mixed it up and typed
>>> "delta". My apologies!
>>>
>>> With that fix, I'm seeing closer to ~500ns of error in the
>>> interpolation, just using the userland sampling. Now, I've also
>>> disabled vsyscalls for this (otherwise I wouldn't be able to
>>> trace_printk), so the error likely would be higher than with
>>> vsyscalls.
>>>
>>> Now, part of the error is that:
>>> start= rdtsc();
>>> clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
>>> end = rdtsc();
>>>
>>> Ends up looking like
>>> start= rdtsc();
>>> clock_gettime() {
>>> now = rdtsc();
>>> delta = now - last;
>>> ns = (delta * mult) >> shift
>>> [~midpoint~]
>>> ts->nsec = base_ns + ns;
>>> ts->sec = base_sec;
>>> normalize_ts(ts)
>>> }
>>> end = rdtsc();
>>>
>>> And so by taking the mid-point we're always a little skewed from where
>>> the tsc was actually read. Looking at the data for my case the tsc
>>> read seems to be ~12% in, so you could instead try:
>>>
>>> delta = end - start;
>>> p12 = start + ((delta * 12) + (100/2))/100;
>>>
>>> With that adjustment, I'm seeing error around ~40ns.
>>>
>>> Mind giving that a try?
>>
>> I tried both the new mid and p12. The error becomes even larger.
>>
>> With new mid (start + (delta +1)/2), the error is now ~3800ns
>> With p12 adjustment, the error is ~6700ns.
>>
>>
>> Here is how I run the test.
>> $./time
>> $perf record -e cycles:upp --clockid monotonic_raw $some_workaround
>> $./time
>>
>> Here are some raw data.
>>
>> For the first ./time,
>> start: 961886196018
>> end: 961886215603
>> MONO_RAW: 341485848531
>>
>> For the second ./time,
>> start: 986870117783
>> end: 986870136152
>> MONO_RAW: 351495432044
>>
>> Here is the time generated from one PEBS record.
>> TSC: 968210217271
>> PEBS_MONO_RAW (calculated via kernel conversion information): 344019503072
>>
>> Using new mid (start + (delta +1)/2), the guessed PEBS_MONO_RAW is
>> 344019506897. The error is 3825ns.
>> Using p12 adjustment, the guessed PEBS_MONO_RAW is 344019509831.
>> The error is 6759ns
>
> Huh. I dunno. That seems wild that the error increased.
>
> Just in case something is going astray with the PEBS_MONO_RAW logic,
> can you apply the hack patch I was using to display the MONOTONIC_RAW
> values the kernel calculates?
> https://github.com/johnstultz-work/linux-dev/commit/8d7896b078965b059ea5e8cc21841580557f6df6
>
> It uses trace_printk, so you'll have to cat /sys/kernel/tracing/trace
> to get the output.
>


$ ./time_3
start: 7358368893806 end: 7358368902944 delta: 9138
MONO_RAW: 2899739790738
MID: 7358368898375
P12: 7358368894903
$ sudo cat /sys/kernel/tracing/trace | grep time_3
time_3-1443 [002] ..... 2899.858936: ktime_get_raw_ts64:
JDB: timekeeping_get_delta cycle_now: 7358368897679
time_3-1443 [002] ..... 2899.858937: ktime_get_raw_ts64:
JDB: ktime_get_raw_ts64: 2899739790738

The error between MID and cycle_now is -696ns
The error between P12 and cycle_now is 2776ns

The time_3.c is attached.

Thanks,
Kan#include <sys/time.h>
#include <time.h>
#include <stdio.h>
#include <errno.h>

static inline unsigned long rdtsc ()
{
unsigned long var;
unsigned int hi, lo;

asm volatile ("rdtsc" : "=a" (lo), "=d" (hi));
var = ((unsigned long long int) hi << 32) | lo;

return var;
}

typedef unsigned long long u64;

int main()
{
struct timespec ts;
u64 start, end, delta, mid, p12;
do {
start= rdtsc();
clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
end = rdtsc();
delta = end-start;
} while (delta > 20000); // make sure the reads were not preempted

printf("start: %llu end: %llu delta: %llu\n", start, end, delta);
printf("MONO_RAW: %llu\n", (u64)ts.tv_sec * 1000000000 + ts.tv_nsec);
mid = start + (delta + 1)/2; //round-closest
printf("MID: %llu\n", mid);
p12 = start + ((delta * 12) + (100/2))/100;
printf("P12: %llu\n", p12);
}