Re: [PATCH v4 2/2] perf/core: Fix incorrect time diff in tick adjust period
From: Liang, Kan
Date: Tue Aug 27 2024 - 12:43:47 EST
On 2024-08-21 9:42 a.m., Luo Gengkun wrote:
> Perf events has the notion of sampling frequency which is implemented in
> software by dynamically adjusting the counter period so that samples occur
> at approximately the target frequency. Period adjustment is done in 2
> places:
> - when the counter overflows (and a sample is recorded)
> - each timer tick, when the event is active
> The later case is slightly flawed because it assumes that the time since
> the last timer-tick period adjustment is 1 tick, whereas the event may not
> have been active (e.g. for a task that is sleeping).
>
Do you have a real-world example to demonstrate how bad it is if the
algorithm doesn't take sleep into account?
I'm not sure if introducing such complexity in the critical path is
worth it.
> Fix by using jiffies to determine the elapsed time in that case.
>
> Signed-off-by: Luo Gengkun <luogengkun@xxxxxxxxxxxxxxx>
> ---
> include/linux/perf_event.h | 1 +
> kernel/events/core.c | 11 ++++++++---
> 2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 1a8942277dda..d29b7cf971a1 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -265,6 +265,7 @@ struct hw_perf_event {
> * State for freq target events, see __perf_event_overflow() and
> * perf_adjust_freq_unthr_context().
> */
> + u64 freq_tick_stamp;
> u64 freq_time_stamp;
> u64 freq_count_stamp;
> #endif
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index a9395bbfd4aa..86e80e3ef6ac 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -55,6 +55,7 @@
> #include <linux/pgtable.h>
> #include <linux/buildid.h>
> #include <linux/task_work.h>
> +#include <linux/jiffies.h>
>
> #include "internal.h"
>
> @@ -4120,7 +4121,7 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
> {
> struct perf_event *event;
> struct hw_perf_event *hwc;
> - u64 now, period = TICK_NSEC;
> + u64 now, period, tick_stamp;
> s64 delta;
>
> list_for_each_entry(event, event_list, active_list) {
> @@ -4148,6 +4149,10 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
> */
> event->pmu->stop(event, PERF_EF_UPDATE);
>
> + tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
Seems it only needs to retrieve the time once at the beginning, not for
each event.
There is a perf_clock(). It's better to use it for the consistency.
Thanks,
Kan
> + period = tick_stamp - hwc->freq_tick_stamp;
> + hwc->freq_tick_stamp = tick_stamp;
> +
> now = local64_read(&event->count);
> delta = now - hwc->freq_count_stamp;
> hwc->freq_count_stamp = now;
> @@ -4157,9 +4162,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
> * reload only if value has changed
> * we have stopped the event so tell that
> * to perf_adjust_period() to avoid stopping it
> - * twice.
> + * twice. And skip if it is the first tick adjust period.
> */
> - if (delta > 0)
> + if (delta > 0 && likely(period != tick_stamp))
> perf_adjust_period(event, period, delta, false);>
> event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);