Re: [RFC PATCH V2 2/9] perf: Extend ABI to support post-processing monotonic raw conversion
From: John Stultz
Date: Wed Mar 08 2023 - 20:18:14 EST
On Wed, Mar 8, 2023 at 10:44 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
> On 2023-02-17 6:11 p.m., John Stultz wrote:
> > On Tue, Feb 14, 2023 at 12:38 PM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
> >> On 2023-02-14 3:11 p.m., John Stultz wrote:
> >>> On Tue, Feb 14, 2023 at 9:00 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
> >>>> On 2023-02-14 9:51 a.m., Liang, Kan wrote:
> >>>>> If I understand correctly, the idea is to let the user space tool run
> >>>>> the above interpoloation algorithm several times to 'guess' the atomic
> >>>>> mapping. Using the mapping information to covert the TSC from the PEBS
> >>>>> record. Is my understanding correct?
> >>>>>
> >>>>> If so, to be honest, I doubt we can get the accuracy we want.
> >>>>>
> >>>>
> >>>> I implemented a simple test to evaluate the error.
> >>>
> >>> Very cool!
> >>>
> >>>> I collected TSC -> CLOCK_MONOTONIC_RAW mapping using the above algorithm
> >>>> at the start and end of perf cmd.
> >>>> MONO_RAW TSC
> >>>> start 89553516545645 223619715214239
> >>>> end 89562251233830 223641517000376
> >>>>
> >>>> Here is what I get via mult/shift conversion from this patch.
> >>>> MONO_RAW TSC
> >>>> PEBS 89555942691466 223625770878571
> >>>>
> >>>> Then I use the time information from start and end to create a linear
> >>>> function and 'guess' the MONO_RAW of PEBS from the TSC. I get
> >>>> 89555942692721.
> >>>> There is a 1255 ns difference.
> >>>> I tried several different PEBS records. The error is ~1000ns.
> >>>> I think it should be an observable error.
> >>>
> >>> Interesting. That's a good bit higher than I'd expect as I'd expect a
> >>> clock_gettime() call to take ~ double digit nanoseconds range on
> >>> average, so the error should be within that.
> >>>
> >>> Can you share your logic?
> >>>
> >>
> >> I run the algorithm right before and after the perf command as below.
> >> (The source code of time is attached.)
> >>
> >> $./time
> >> $perf record -e cycles:upp --clockid monotonic_raw $some_workaround
> >> $./time
> >>
> >> The time will dump both MONO_RAW and TSC. That's where "start" and "end"
> >> from.
> >> The perf command print out both TSC and converted MONO_RAW (using the
> >> mul/shift from this patch series). That's where "PEBS" value from.
> >>
> >> Than I use the below formula to calculate the guessed MONO_RAW of PEBS TSC.
> >> Guessed_MONO_RAW = (PEBS_TSC - start_TSC) / (end_TSC - start_TSC) *
> >> (end_MONO_RAW - start_MONO_RAW) + start_MONO_RAW.
> >>
> >> The guessed_MONO_RAW is 89555942692721.
> >> The PEBS_MONO_RAW is 89555942691466.
> >> The difference is 1255.
> >>
> >> Is the calculation correct?
> >
> > Thanks for sharing it. The equation you have there looks ok at a high
> > level for the values you captured (there's small tweaks like doing the
> > mult before the div to make sure you don't hit integer precision
> > issues, but I didn't see that with your results).
> >
> > I've got a todo to try to see how the calculation changes if we do
> > provide atomic TSC/RAW stamps, here but I got a little busy with other
> > work and haven't gotten to it.
> > So my apologies, but I'll try to get back to this soon.
> >
>
> Have you got a chance to try the idea?
>
> I just want to check whether the userspace interpolation approach works.
> Should I prepare V3 and go back to the kernel solution?
Oh, my apologies. I had some other work come up and this fell off my plate.
So I spent a little bit of time today adding some trace_printks to the
timekeeping code so I could record the actual TSC and timestamps being
calculated from CLOCK_MONOTONIC_RAW.
I did catch one error in the test code, which unfortunately I'm to blame for:
mid = start + (delta +(delta/2))/2; //round-closest
That should be
mid = start + (delta +(2/2))/2 //round-closest
or more simply
mid = start + (delta +1)/2; //round-closest
Generalized rounding should be: (value + (DIV/2))/DIV), but I'm
guessing with two as the divisor, my brain mixed it up and typed
"delta". My apologies!
With that fix, I'm seeing closer to ~500ns of error in the
interpolation, just using the userland sampling. Now, I've also
disabled vsyscalls for this (otherwise I wouldn't be able to
trace_printk), so the error likely would be higher than with
vsyscalls.
Now, part of the error is that:
start= rdtsc();
clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
end = rdtsc();
Ends up looking like
start= rdtsc();
clock_gettime() {
now = rdtsc();
delta = now - last;
ns = (delta * mult) >> shift
[~midpoint~]
ts->nsec = base_ns + ns;
ts->sec = base_sec;
normalize_ts(ts)
}
end = rdtsc();
And so by taking the mid-point we're always a little skewed from where
the tsc was actually read. Looking at the data for my case the tsc
read seems to be ~12% in, so you could instead try:
delta = end - start;
p12 = start + ((delta * 12) + (100/2))/100;
With that adjustment, I'm seeing error around ~40ns.
Mind giving that a try?
Now, if you had two snapshots of MONOTONIC_RAW + the TSC value used to
calculate it(maybe the driver access this via a special internal
timekeeping interface), in my testing interpolating will give you
sub-ns error. So I think this is workable without exposing quite so
much to userland.
thanks
-john