Re: rtla osnoise hist: average duration is always zero
From: Daniel Bristot de Oliveira
Date: Wed Dec 28 2022 - 10:25:33 EST
Hi Andreas,
On 12/24/22 13:39, Andreas Ziegler wrote:
> -- Observed in, but not limited to, Linux 6.1.1
Since original commit... The best way to report this is adding
a Fixes: tag. For example:
Fixes: 829a6c0b5698 ("rtla/osnoise: Add the hist mode")
> rtla osnoise hist always outputs '0' as average duration value. Example:
>
> # rtla osnoise hist -P F:1 -c 0-1 -r 900000 -d 1M -b 1 -E 5000 -T 1
> # RTLA osnoise histogram
> # Time unit is microseconds (us)
> # Duration: 0 00:01:00
> ...
> count: 5629 1364
> min: 1 1
> avg: 0 0
> max: 2955 56
>
> This is due to sum_sample in osnoise_hist_update_multiple() being calculated as the sum (duration), not as sum (duration * count).
Yeah, that is correct. It works on timerlat hist because timerlat hist collects
each trace event. osnoise hist uses in-kernel histograms, so we need to multiply
the value with the count. This is a leftover from the development phase, as I started
using tracing and then moved to histograms (which is better).
> Rounding, instead of truncating, of the average value would be cool.
I thought: the values were already rounded up, so it might be rounding too much.
But we are in user space. It is just easier to add a two digits precision
to the value, no?
> The following patch would solve the issue described above:
>
>
> Sampled duration must be weighted by observed quantity, to arrive at a
> correct average duration value.
>
> Fix calculation of total duration by summing (duration * count).
> Introduce rounding for calculation of final value.
>
> Signed-off-by: Andreas Ziegler <br015@xxxxxxxxxx>
>
> --- a/tools/tracing/rtla/src/osnoise_hist.c
> +++ b/tools/tracing/rtla/src/osnoise_hist.c
> @@ -121,6 +121,7 @@
> {
> struct osnoise_hist_params *params = tool->params;
> struct osnoise_hist_data *data = tool->data;
> + unsigned long long total_duration;
> int entries = data->entries;
> int bucket;
> int *hist;
> @@ -131,10 +132,12 @@
> if (data->bucket_size)
> bucket = duration / data->bucket_size;
>
> + total_duration = duration * count;
> +
> hist = data->hist[cpu].samples;
> data->hist[cpu].count += count;
> update_min(&data->hist[cpu].min_sample, &duration);
> - update_sum(&data->hist[cpu].sum_sample, &duration);
> + update_sum(&data->hist[cpu].sum_sample, &total_duration);
How about re-seding a patch with the code above, adding the:
Fixes: 829a6c0b5698 ("rtla/osnoise: Add the hist mode")
and...
> update_max(&data->hist[cpu].max_sample, &duration);
>
> if (bucket < entries)
> @@ -333,7 +336,7 @@
>
> if (data->hist[cpu].count)
> trace_seq_printf(trace->seq, "%9llu ",
> - data->hist[cpu].sum_sample / data->hist[cpu].count);
> + (data->hist[cpu].sum_sample + data->hist[cpu].count / 2) / data->hist[cpu].count);
another patch with this part, adding two digits precision?
> else
> trace_seq_printf(trace->seq, " - ");
> }
>
Thanks!
-- Daniel
> Kind regards,
> Andreas