Re: [PATCH] x86: tsc: Rework time_cpufreq_notifier()
From: Viresh Kumar
Date: Mon Apr 22 2019 - 04:17:37 EST
On 18-04-19, 16:11, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>
> There are problems with running time_cpufreq_notifier() on SMP
> systems.
>
> First off, the rdtsc() called from there runs on the CPU executing
> that code and not necessarily on the CPU whose sched_clock() rate is
> updated which is questionable at best.
>
> Second, in the cases when the frequencies of all CPUs in an SMP
> system are always in sync, it is not sufficient to update just
> one of them or the set associated with a given cpufreq policy on
> frequency changes - all CPUs in the system should be updated and
> that would require more than a simple transition notifier.
>
> Note, however, that the underlying issue (the TSC rate depending on
> the CPU frequency) has not been present in hardware shipping for the
> last few years and in quite a few relevant cases (acpi-cpufreq in
> particular) running time_cpufreq_notifier() will cause the TSC to
> be marked as unstable anyway.
>
> For this reason, make time_cpufreq_notifier() simply mark the TSC
> as unstable and give up when run on SMP and only try to carry out
> any adjustments otherwise.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> ---
> arch/x86/kernel/tsc.c | 29 ++++++++++++++---------------
> 1 file changed, 14 insertions(+), 15 deletions(-)
>
> Index: linux-pm/arch/x86/kernel/tsc.c
> ===================================================================
> --- linux-pm.orig/arch/x86/kernel/tsc.c
> +++ linux-pm/arch/x86/kernel/tsc.c
> @@ -185,8 +185,7 @@ static void __init cyc2ns_init_boot_cpu(
> /*
> * Secondary CPUs do not run through tsc_init(), so set up
> * all the scale factors for all CPUs, assuming the same
> - * speed as the bootup CPU. (cpufreq notifiers will fix this
> - * up if their speed diverges)
> + * speed as the bootup CPU.
> */
> static void __init cyc2ns_init_secondary_cpus(void)
> {
> @@ -937,12 +936,12 @@ void tsc_restore_sched_clock_state(void)
> }
>
> #ifdef CONFIG_CPU_FREQ
> -/* Frequency scaling support. Adjust the TSC based timer when the cpu frequency
> +/*
> + * Frequency scaling support. Adjust the TSC based timer when the CPU frequency
> * changes.
> *
> - * RED-PEN: On SMP we assume all CPUs run with the same frequency. It's
> - * not that important because current Opteron setups do not support
> - * scaling on SMP anyroads.
> + * NOTE: On SMP the situation is not fixable in general, so simply mark the TSC
> + * as unstable and give up in those cases.
> *
> * Should fix up last_tsc too. Currently gettimeofday in the
> * first tick after the change will be slightly wrong.
> @@ -956,22 +955,22 @@ static int time_cpufreq_notifier(struct
> void *data)
> {
> struct cpufreq_freqs *freq = data;
> - unsigned long *lpj;
>
> - lpj = &boot_cpu_data.loops_per_jiffy;
> -#ifdef CONFIG_SMP
> - if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> - lpj = &cpu_data(freq->cpu).loops_per_jiffy;
> -#endif
> + if (num_online_cpus() > 1) {
What about checking num_possible_cpus() instead ? So we reliably quit
everytime even if some CPUs are offlined.
And can we place this check before registering the notifier, so it
never gets called ?
> + mark_tsc_unstable("cpufreq changes on SMP");
> + return 0;
> + }
>
> if (!ref_freq) {
> ref_freq = freq->old;
> - loops_per_jiffy_ref = *lpj;
> + loops_per_jiffy_ref = boot_cpu_data.loops_per_jiffy;
> tsc_khz_ref = tsc_khz;
> }
> +
> if ((val == CPUFREQ_PRECHANGE && freq->old < freq->new) ||
> - (val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) {
> - *lpj = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
> + (val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) {
> + boot_cpu_data.loops_per_jiffy =
> + cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
>
> tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
> if (!(freq->flags & CPUFREQ_CONST_LOOPS))
>
>
--
viresh