Re: [PATCH v2 3/5] cpufreq/amd-pstate: Add support for platform profile class

From: Mario Limonciello
Date: Tue Mar 18 2025 - 15:11:20 EST


On 3/10/2025 00:09, Gautham R. Shenoy wrote:
[...snip...]

On Fri, Mar 07, 2025 at 10:30:25PM -0600, Mario Limonciello wrote:
On 3/7/2025 10:55, Mario Limonciello wrote:
On 3/7/2025 10:22, Gautham R. Shenoy wrote:
+static int amd_pstate_profile_set(struct device *dev,
+                  enum platform_profile_option profile)
+{
+    struct amd_cpudata *cpudata = dev_get_drvdata(dev);
+    struct cpufreq_policy *policy __free(put_cpufreq_policy) =
cpufreq_cpu_get(cpudata->cpu);
+    int ret;
+
+    switch (profile) {
+    case PLATFORM_PROFILE_LOW_POWER:
+        if (cpudata->policy != CPUFREQ_POLICY_POWERSAVE)
+            cpudata->policy = CPUFREQ_POLICY_POWERSAVE;

So prior to the patch, cpudata->policy is supposed to mirror
policy->policy.  With this patch, this assumption is no longer
true. So it is possible for the user to again override the choice of
EPP set via platform profile by changing the cpufreq governor ?

Is this the expected behaviour?

The bigger concern is, if the governor was previously "performance"
and then the platform profile requested "low power", "cat
/sys/devices/system/cpu/cpuX/cpufreq/scaling_governor" would still
show "performance", which is inconsistent with the behaviour.



This ties back to the previous patches for dynamic EPP.  My expectation
was that when dynamic EPP is enabled that users can't manually set the
EPP anymore (it will return -EBUSY) and likewise turning on dynamic EPP
should keep the governor as powersave.

I'll double check all those are properly enforced; but that's at least
the intent.

FWIW - I double checked and confirmed that this is working as intended.
* I couldn't change from powersave to performance when dynamic_epp was
enabled (-EBUSY)
* I couldn't change energy_performance_preference when dynamic_epp was
enabled (-EBUSY)

Thanks for double checking this.

The other option is to create a "3rd" cpufreq driver for amd-pstate when dynamic epp is in use.

The upside is this one could then exclude "energy_performance_preference" and "energy_performance_available_preferences" so no need to report -EBUSY for that case since you can't manually change EPP.

The downside is that you couldn't discover what the kernel was doing with EPP without ftrace.





IMO this "should" all work because turning on Dynamic EPP sysfs file
forces the driver to go through a state transition that it will tear
everything down and back up.  The policy will come back up in
"powersave" even if it was previously in "performance" when the dynamic
EPP sysfs file was turned on.

Longer term; I also envision the scheduler influencing EPP values when
dynamic_epp is turned on.  The "platform profile" would be an "input" to
that decision making process (maybe giving a weighting?), not the only
lever.

Yes, the scheduler influencing the EPP values is something that I have
been wanting to explore as well. My idea was to use the nature of the
task + the load on the rq to determine the EPP value.


I haven't given any serious look at how to do this with the scheduler, I
wanted to lay the foundation first being dynamic EPP and raw EPP.

So even if dynamic_epp isn't interesting "right now" for server because
the focus is around behavior for AC/DC, don't write it off just yet.


Fair enough.