Re: [PATCH] perf sched: kill time stamp discrepancy between script and latency

From: Namhyung Kim
Date: Sat Oct 01 2016 - 01:16:14 EST

Next message: Vinod Koul: "Re: [PATCH] dmaengine: omap-dma: Enable burst and data pack for SG"
Previous message: Namhyung Kim: "Re: [PATCH RESEND] perf diff: Introduce the new rules of colored printing of delta."
Next in thread: Joonwoo Park: "Re: [PATCH] perf sched: kill time stamp discrepancy between script and latency"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Joonwoo,

On Wed, Sep 28, 2016 at 07:25:26PM -0700, Joonwoo Park wrote:
> Perf sched latency is handy to find out the maximum sched latency and
> the time stamp of the event. After running sched latency, if a found
> latency looks suspicious it's quite reasonable to run perf script
> subsequently and search with the time stamp given by perf sched latency
> to continue further debugging. However, at present, it's possible the
> time stamp given by perf sched latency cannot be found in the trace
> output by perf script because perf sched latency converts the event
> time from ns to ms as double float and prints it with printf which
> does banker's rounding as opposed to perf script doesn't.
>
> For example:
>
> 0x750ff0 [0x80]: event: 9
> <snip>
> 2 1858303049520 0x750ff0 [0x80]: PERF_RECORD_SAMPLE(IP, 0x1): 15281/15281: 0xffffffff8162a63a period: 1 addr: 0
> ... thread: hackbench:15281
> <snip>
>
> $ perf sched -i perf.data latency | grep hackbench
> hackbench:(401) + 3539.283 ms | 23347 | avg: 7.286 ms | max: 829.998 ms | max at: 1858.303050 s
>
> $ perf script -i perf.data | grep "1858\.303050"
>
> $ perf script -i perf.data | grep "1858\.303049"
> hackbench 15281 [002] 1858.303049: sched:sched_switch: prev_comm=hackbench prev_pid=15281 prev_prio=120 prev_state=D ==> next_comm=hackbench next_pid=15603 next_prio=120
>
> Fix perf latency to print out time stamp without rounding to avoid such
> discrepancy.
>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Signed-off-by: Joonwoo Park <joonwoop@xxxxxxxxxxxxxx>
> ---
>
> I was tempted to get rid of all u64 to double casting in the function
> output_lat_thread but didn't because there is no data loss as of
> today. Double float gives at least 15 significant decimal digits
> precision while the function requires only 14 significant digits precision.
>
> $ python -c "print(len(str(int(0xffffffffffffffff / 1e6))))"
> 14
>
> tools/lib/traceevent/event-parse.h | 1 +
> tools/perf/builtin-sched.c | 12 ++++++++++--
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h
> index 9ffde37..f42703c 100644
> --- a/tools/lib/traceevent/event-parse.h
> +++ b/tools/lib/traceevent/event-parse.h
> @@ -174,6 +174,7 @@ struct pevent_plugin_option {
>
> #define NSECS_PER_SEC 1000000000ULL
> #define NSECS_PER_USEC 1000ULL
> +#define MSECS_PER_SEC 1000ULL
>
> enum format_flags {
> FIELD_IS_ARRAY = 1,
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index afa0576..e5cf51a 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -1190,6 +1190,7 @@ static void output_lat_thread(struct perf_sched *sched, struct work_atoms *work_
> int i;
> int ret;
> u64 avg;
> + u64 max_lat_at_sec, max_lat_at_msec;

Isn't it usec rathen than msec? :)

>
> if (!work_list->nb_atoms)
> return;
> @@ -1212,11 +1213,18 @@ static void output_lat_thread(struct perf_sched *sched, struct work_atoms *work_
>
> avg = work_list->total_lat / work_list->nb_atoms;
>
> - printf("|%11.3f ms |%9" PRIu64 " | avg:%9.3f ms | max:%9.3f ms | max at: %13.6f s\n",
> + /*
> + * Avoid round up with printf to prevent event time discrepency
> + * between sched script and latency.
> + */
> + max_lat_at_sec = work_list->max_lat_at / NSECS_PER_SEC;
> + max_lat_at_msec = (work_list->max_lat_at -
> + max_lat_at_sec * NSECS_PER_SEC) / MSECS_PER_SEC;
> + printf("+%11.3f ms |%9" PRIu64 " | avg:%9.3f ms | max:%9.3f ms | max at: %6lu.%06lu s\n",

Maybe you'd better to be in sync with the script code:

if (PRINT_FIELD(TIME)) {
nsecs = sample->time;
secs = nsecs / NSECS_PER_SEC;
nsecs -= secs * NSECS_PER_SEC;
usecs = nsecs / NSECS_PER_USEC;
if (nanosecs)
printf("%5lu.%09llu: ", secs, nsecs);
else
printf("%5lu.%06lu: ", secs, usecs);
}

Thanks,
Namhyung

> (double)work_list->total_runtime / 1e6,
> work_list->nb_atoms, (double)avg / 1e6,
> (double)work_list->max_lat / 1e6,
> - (double)work_list->max_lat_at / 1e9);
> + max_lat_at_sec, max_lat_at_msec);
> }
>
> static int pid_cmp(struct work_atoms *l, struct work_atoms *r)
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> hosted by The Linux Foundation
>

Next message: Vinod Koul: "Re: [PATCH] dmaengine: omap-dma: Enable burst and data pack for SG"
Previous message: Namhyung Kim: "Re: [PATCH RESEND] perf diff: Introduce the new rules of colored printing of delta."
Next in thread: Joonwoo Park: "Re: [PATCH] perf sched: kill time stamp discrepancy between script and latency"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]