Re: [RFC/RFT][PATCH] cpufreq: intel_pstate: Accept passive mode with HWP enabled

From: Srinivas Pandruvada
Date: Sun May 31 2020 - 12:54:35 EST



Hi Doug,

On Sun, 2020-05-31 at 09:39 -0700, Doug Smythies wrote:
> The content of this e-mail is also at [1],
> with annotated graphs.
>
> Hi Rafael,
>
> Hmmm... I think the most important takeaway from
> my previous e-mail might have been missed!
>
> HWP does not work properly on my i5-9600K test computer.
>
> For those that don't have to read all this, my upgraded
> assertion is:
>
> With HWP enabled, if idle state 2 is used, there is a probability
> that the CPU frequency might unexpectedly drop significantly.
>
> Detail (see [1] for supporting graphs and links):
>
> I can not proceed with testing this.
> Why not?
> Because I do not have a stable good system on top of which to add
> this patch.
> I do not know what is wrong such that HWP appears broken.
>
> For my part of it, I had to stop and dig into
> why HWP doesn't seem to work properly on my newer test computer.
>
> Notes:
> I have never used HWP before, and have had it disabled so far on this
> newer test computer. This patch seemed like a great opportunity to
> try it.
> Why (in addition to helping via review/test)? Because trace now
> works,
> whereas it doesn't in active mode with HWP.
>
> It is on my list to explore viability of a mode for trace where it
> monitors
> what the processor is doing via HWP, rather than reporting what the
> processor is being told to do. However, in the meantime, this is
> great.
>
> Example of what is wrong on my system (repeated, but new example
> trace data,
> from earlier e-mail):
>
> Kernel: 5.7-rc6 and + this version of patch when trace data required.
> Patch config: DELAY_HWP 10000 ; 87.5% fraction (7/8).
> Keep in mind that the trace entry is the scaled min value, not the
> target p-state.
>
> Load:
> Single thread, forced CPU affinity.
> fixed work packet per work period, i.e. the amount of work to do is
> independent of CPU frequency.
> 347 hertz work / sleep frequency.
> To reveal any hysteresis (i.e. with conservative governor) load ramps
> up from none
> to 100% and then back down to none at 3 seconds per step (step size
> is uncalibrated).
>
> Processor:
> Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
>
> What do I see?
>
> Unexpected frequency drops at around 70% load.
> Example, from trace:
>
> Event begins at 17.456 seconds elapsed time.
> Previous event was about 107 milliseconds ago.
>
> Old min ; new min ; freq GHz; load % ; duration mS
> 27 ; 28 ; 4.60 ; 68.17 ; 10.226
> 28 ; 26 ; 4.53 ; 57.47 ; 10.005

Seems you hit power/thermal limit

Is this some Lenovo system?

If you disable HWP you don't see that?

What is the value of
cat /sys/bus/pci/devices/0000\:00\:04.0/tcc_offset_degree_celsius
cat /sys/class/powercap/intel-rapl-mmio/intel-rapl-
mmio:0/constraint_0_power_limit_uw

You may want to run
Try running dptfxtract once.

Then try to get again

cat /sys/bus/pci/devices/0000\:00\:04.0/tcc_offset_degree_celsius
cat /sys/class/powercap/intel-rapl-mmio/intel-rapl-
mmio:0/constraint_0_power_limit_uw


Thanks,
Srinivas

> 26 ; 40 ; 2.87 ; 100.00 ; 10.996 <<<< What? Why freq
> down? (note: it is always above old min)
> 40 ; 29 ; 4.25 ; 69.67 ; 10.002 <<<< O.K. recovering,
> but damage done.
> 29 ; 26 ; 4.60 ; 59.14 ; 10.652 <<<< Event ends. Next
> event in 128 milliseconds.
>
> I can not think of any reason why the frequency would have been
> reduced so far by HWP.
>
> O.K., the above had to be done with the patch so that trace could be
> used.
> Ondemand was the governor, because its response curve looks the same
> as
> active/powersave in the area of concern.
>
> The below was done with kernel 5.7-rc6 and only turbostat, at a low
> sample rate of 15 seconds per,
> in addition to the test load. 100 seconds each:
>
> intel_pstate/powersave hwp:
>
> Overruns: 11327
> Ave. work percent: 77.952091
> Processor package power: ~13 watts.
> Average CPU frequency: 3.8 gigahertz
>
> intel_pstate/powersave no-hwp:
>
> Overruns: 7 (task start ramp up related. I have trace proof.)
> Ave. work percent: 74.932603
> Processor package power: ~11.5 watts.
> Average CPU frequency: 4.0 gigahertz
>
> Question: What is the relevance of the 347 hertz and
> the >=70% load for this issue? Why does it appear
> to be such a sharp works/fine doesn't work issue?
>
> Answer: The issue seems to be related to when the
> sleep portion of a work/sleep periodic workflow
> approaches and goes below 1 millisecond (1 jiffy).
>
> some tests were done, varying the work/sleep frequency,
> and idle states and overruns and such, the web version
> of this e-mail has graphs.
>
> Notes:
>
> The breakpoint between no-overruns/overruns is around 950
> microseconds.
>
> A 250 hertz kernel was tested, and it did not have this issue in this
> area.
> Perhaps elsewhere, I didn't look.
>
> 1000 hertz kernels were tested back to kernel 5.2, all failed.
>
> If the issue is jiffy related (a race condition?) then a work/sleep
> frequency
> of 333.3333 hertz should behave in binary way, either lots of
> overruns or none
> as a function of the task start time. (Preliminary) It does.
>
> If the issue is jiffy related (a race condition?) then a work/sleep
> frequency
> of 500.00 hertz should behave in binary way, either lots of overruns
> or none
> as a function of the task start time. It does. There are occurrences
> when idle
> state 2 is used somewhat without overruns.
>
> Both teo and menu idle governors were tested, and while both suffer
> from
> unexpected CPU frequency drop, teo seems much worse. However failure
> points
> for both governors are repeatable.
>
> There was new BIOS for this test computer a week ago.
> System now upgraded and tested with default BIOS settings (by
> accident)
> and my settings. (Note: slight changes in processor package power for
> system idle and one CPU loaded with new BIOS, and earlier tests NOT
> re-done).
>
> Is the processor using the latest microcode? Currently 0xcc. Can not
> figure out if there is anything newer.
>
> Leaving out the details, but all the tests and results are available,
> a mess but available, the summary is:
>
> With HWP enabled, if idle state 2 is used, there is a probability
> that the CPU frequency might unexpectedly drop significantly.
> If the processor does this by itself, or by being told to via sources
> outside of the intel_pstate CPU frequency driver, I don't
> know.
>
> The load sweep test was run at 6 seconds per step during increasing
> load and 3 seconds per step decreasing
> (by mistake, if you must know), while monitoring the idle statistics.
> The test was done in a hurry, so many above/below statistics are 0%,
> due to insufficient sample size.
> The overruns and use of idle state 0 are exactly correlated.
> There are a lot of graphs on the idle statistics web page, but the
> idle state 2 usage correlates exactly with
> undesired low CPU frequency and overruns.
>
> Side note: Even in the areas where HWP appears to behave, the no-hwp
> power use is much better.
>
> O.K., so now, do a couple of more turbostat samples:
>
> intel_pstate/powersave hwp idle state 2 disabled:
>
> Overruns: 3
> Ave. work percent: 66.647895
> Processor package power: ~16.8 watts.
> Average CPU frequency: 4.6 gigahertz
>
> intel_pstate/powersave hwp idle state 3 disabled:
>
> Overruns: 22
> Ave. work percent: 66.647895
> Processor package power: ~16.2 watts.
> Average CPU frequency: 4.6 gigahertz
>
> To prevent all the bots that burden my site, the link is coded:
> [1] double u double u double u dot smythies dot com
> /~doug/linux/s18/hwp/index.html
>
> ... Doug
>
> > -----Original Message-----
> > From: Rafael J. Wysocki [mailto:rjw@xxxxxxxxxxxxx]
> > Sent: May 26, 2020 11:21 AM
> > To: Linux PM; Doug Smythies
> > Cc: LKML; Len Brown; Srinivas Pandruvada; Peter Zijlstra; Giovanni
> > Gherdovich; Francisco Jerez
> > Subject: [RFC/RFT][PATCH] cpufreq: intel_pstate: Accept passive
> > mode with HWP enabled
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >
> > Allow intel_pstate to work in the passive mode with HWP enabled and
> > make it set the HWP minimum performance limit to 75% of the P-state
> > value corresponding to the target frequency supplied by the cpufreq
> > governor, so as to prevent the HWP algorithm and the CPU scheduler
> > from working against each other at least when the schedutil
> > governor
> > is in use.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > ---
> >
> > This is a replacement for
> > https://patchwork.kernel.org/patch/11563615/ that
> > uses the HWP floor (minimum performance limit) as the feedback to
> > the HWP
> > algorithm (instead of the EPP).
> >
> > The INTEL_CPUFREQ_TRANSITION_DELAY_HWP is still 5000 and the
> > previous comments
> > still apply to it.
> >
> > In addition to that, the 75% fraction used in
> > intel_cpufreq_adjust_hwp() can be
> > adjusted too, but I would like to use a value with a power-of-2
> > denominator for
> > that (so the next candidate would be 7/8).
> >
> > Everyone who can do that is kindly requested to test this and let
> > me know
> > the outcome.
>
>