RE: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors

From: Yuan, Perry
Date: Sun Dec 25 2022 - 11:50:15 EST


[AMD Official Use Only - General]



> -----Original Message-----
> From: Huang, Ray <Ray.Huang@xxxxxxx>
> Sent: Friday, December 23, 2022 3:16 PM
> To: Yuan, Perry <Perry.Yuan@xxxxxxx>
> Cc: Huang, Shimmer <Shimmer.Huang@xxxxxxx>; Limonciello, Mario
> <Mario.Limonciello@xxxxxxx>; Liang, Richard qi
> <Richardqi.Liang@xxxxxxx>; rafael.j.wysocki@xxxxxxxxx;
> viresh.kumar@xxxxxxxxxx; Sharma, Deepak <Deepak.Sharma@xxxxxxx>;
> Fontenot, Nathan <Nathan.Fontenot@xxxxxxx>; Deucher, Alexander
> <Alexander.Deucher@xxxxxxx>; Du, Xiaojian <Xiaojian.Du@xxxxxxx>;
> Meng, Li (Jassmine) <Li.Meng@xxxxxxx>; Karny, Wyes
> <Wyes.Karny@xxxxxxx>; linux-pm@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate EPP
> support for the AMD processors
>
> On Mon, Dec 19, 2022 at 06:21:14PM +0800, Yuan, Perry wrote:
> > [AMD Official Use Only - General]
> >
> > HI ray.
> >
> > > -----Original Message-----
> > > From: Huang, Ray <Ray.Huang@xxxxxxx>
> > > Sent: Monday, December 12, 2022 4:47 PM
> > > To: Yuan, Perry <Perry.Yuan@xxxxxxx>
> > > Cc: rafael.j.wysocki@xxxxxxxxx; Limonciello, Mario
> > > <Mario.Limonciello@xxxxxxx>; viresh.kumar@xxxxxxxxxx; Sharma,
> Deepak
> > > <Deepak.Sharma@xxxxxxx>; Fontenot, Nathan
> <Nathan.Fontenot@xxxxxxx>;
> > > Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Huang, Shimmer
> > > <Shimmer.Huang@xxxxxxx>; Du, Xiaojian <Xiaojian.Du@xxxxxxx>;
> Meng,
> > > Li (Jassmine) <Li.Meng@xxxxxxx>; Karny, Wyes
> <Wyes.Karny@xxxxxxx>;
> > > linux-pm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> > > Subject: Re: [PATCH v7 05/13] cpufreq: amd-pstate: implement Pstate
> > > EPP support for the AMD processors
> > >
> > > On Thu, Dec 08, 2022 at 07:18:44PM +0800, Yuan, Perry wrote:
> > > > From: Perry Yuan <Perry.Yuan@xxxxxxx>
> > > >
> > > > Add EPP driver support for AMD SoCs which support a dedicated MSR
> > > > for CPPC. EPP is used by the DPM controller to configure the
> > > > frequency that a core operates at during short periods of activity.
> > > >
> > > > The SoC EPP targets are configured on a scale from 0 to 255 where
> > > > 0 represents maximum performance and 255 represents maximum
> efficiency.
> > > >
> > > > The amd-pstate driver exports profile string names to userspace
> > > > that are tied to specific EPP values.
> > > >
> > > > The balance_performance string (0x80) provides the best balance
> > > > for efficiency versus power on most systems, but users can choose
> > > > other strings to meet their needs as well.
> > > >
> > > > $ cat
> > > >
> > >
> /sys/devices/system/cpu/cpufreq/policy0/energy_performance_available
> > > _p
> > > > references default performance balance_performance balance_power
> > > power
> > > >
> > > > $ cat
> > > >
> > >
> /sys/devices/system/cpu/cpufreq/policy0/energy_performance_preferenc
> > > e
> > > > balance_performance
> > > >
> > > > To enable the driver,it needs to add `amd_pstate=active` to kernel
> > > > command line and kernel will load the active mode epp driver
> > > >
> > > > Signed-off-by: Perry Yuan <Perry.Yuan@xxxxxxx>
> > > > ---
> > > > drivers/cpufreq/amd-pstate.c | 631
> > > ++++++++++++++++++++++++++++++++++-
> > > > include/linux/amd-pstate.h | 35 ++
> > > > 2 files changed, 660 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/cpufreq/amd-pstate.c
> > > > b/drivers/cpufreq/amd-pstate.c index c17bd845f5fc..0a521be1be8a
> > > 100644
> > > > --- a/drivers/cpufreq/amd-pstate.c
> > > > +++ b/drivers/cpufreq/amd-pstate.c
> > > > @@ -37,6 +37,7 @@
> > > > #include <linux/uaccess.h>
> > > > #include <linux/static_call.h>
> > > > #include <linux/amd-pstate.h>
> > > > +#include <linux/cpufreq_common.h>
> > > >
> > > > #include <acpi/processor.h>
> > > > #include <acpi/cppc_acpi.h>
> > > > @@ -59,9 +60,125 @@
> > > > * we disable it by default to go acpi-cpufreq on these
> > > > processors and add
> > > a
> > > > * module parameter to be able to enable it manually for debugging.
> > > > */
> > > > -static struct cpufreq_driver amd_pstate_driver;
> > > > +static bool cppc_active;
> > > > static int cppc_load __initdata;
> > > >
> > > > +static struct cpufreq_driver *default_pstate_driver; static
> > > > +struct amd_cpudata **all_cpu_data; static struct
> > > > +amd_pstate_params global_params;
> > > > +
> > > > +static DEFINE_MUTEX(amd_pstate_limits_lock);
> > > > +static DEFINE_MUTEX(amd_pstate_driver_lock);
> > > > +
> > > > +static bool cppc_boost __read_mostly;
> > > > +
> > > > +static s16 amd_pstate_get_epp(struct amd_cpudata *cpudata, u64
> > > > +cppc_req_cached) {
> > > > + s16 epp;
> > > > + struct cppc_perf_caps perf_caps;
> > > > + int ret;
> > > > +
> > > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > > + if (!cppc_req_cached) {
> > > > + epp = rdmsrl_on_cpu(cpudata->cpu,
> > > MSR_AMD_CPPC_REQ,
> > > > + &cppc_req_cached);
> > > > + if (epp)
> > > > + return epp;
> > > > + }
> > > > + epp = (cppc_req_cached >> 24) & 0xFF;
> > > > + } else {
> > > > + ret = cppc_get_epp_caps(cpudata->cpu, &perf_caps);
> > > > + if (ret < 0) {
> > > > + pr_debug("Could not retrieve energy perf value
> > > (%d)\n", ret);
> > > > + return -EIO;
> > > > + }
> > > > + epp = (s16) perf_caps.energy_perf;
> > >
> > > It should align the static_call structure to implement the function.
> > > Please refer amd_pstate_init_perf. I think it can even re-use the
> > > init_perf to get the epp cap value.
> >
> > The amd_pstate_init_perf() is only called when driver registering,
> > However the amd_pstate_get_epp() will be called frequently to update
> the EPP MSR value and EPP Min/Max limitation.
> > So I suggest to keep using the amd_pstate_get_epp() to update EPP
> related values as it is.
> >
> > Static_call method can do that for MSR and Shared memory API call,
> > but amd_pstate_get_epp() is simple enough for now. No need to make
> this Epp update function also using static call method.
>
> Using static calls are to avoid retpoline, not to make them simple.
>
> https://thenewstack.io/linux-kernel-5-10-introduces-static-calls-to-prevent-
> speculative-execution-attacks/
>
> > Considering the tight schedule and merge window, I would like to keep the
> current way to update EPP, Otherwise the Customer release schedule will be
> delayed.
> >
>
> Mailing list is not the place to talk about internal schedule.
>
> > Perry.
> >
> >
> > >
> > > > + }
> > > > +
> > > > + return epp;
> > > > +}
> > > > +
> > > > +static int amd_pstate_get_energy_pref_index(struct amd_cpudata
> > > > +*cpudata) {
> > > > + s16 epp;
> > > > + int index = -EINVAL;
> > > > +
> > > > + epp = amd_pstate_get_epp(cpudata, 0);
> > > > + if (epp < 0)
> > > > + return epp;
> > > > +
> > > > + switch (epp) {
> > > > + case HWP_EPP_PERFORMANCE:
> > > > + index = EPP_INDEX_PERFORMANCE;
> > > > + break;
> > > > + case HWP_EPP_BALANCE_PERFORMANCE:
> > > > + index = EPP_INDEX_BALANCE_PERFORMANCE;
> > > > + break;
> > > > + case HWP_EPP_BALANCE_POWERSAVE:
> > > > + index = EPP_INDEX_BALANCE_POWERSAVE;
> > > > + break;
> > > > + case HWP_EPP_POWERSAVE:
> > > > + index = EPP_INDEX_POWERSAVE;
> > > > + break;
> > > > + default:
> > > > + break;
> > > > + }
> > > > +
> > > > + return index;
> > > > +}
> > > > +
> > > > +static int amd_pstate_set_epp(struct amd_cpudata *cpudata, u32
> epp) {
> > > > + int ret;
> > > > + struct cppc_perf_ctrls perf_ctrls;
> > > > +
> > > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > > + u64 value = READ_ONCE(cpudata->cppc_req_cached);
> > > > +
> > > > + value &= ~GENMASK_ULL(31, 24);
> > > > + value |= (u64)epp << 24;
> > > > + WRITE_ONCE(cpudata->cppc_req_cached, value);
> > > > +
> > > > + ret = wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> > > value);
> > > > + if (!ret)
> > > > + cpudata->epp_cached = epp;
> > > > + } else {
> > > > + perf_ctrls.energy_perf = epp;
> > > > + ret = cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
> > >
> > > Since the energy_perf is one of members of struct cppc_perf_ctrls,
> > > could we use cppc_set_perf as well?
> >
> > cppc_set_epp_perf() can handle the EPP value update correctly,
> > cppc_set_perf() is used for desired perf updating with a very high updating
> rate for governor such as schedutil governor.
> > And it has two Phase, Phase I and Phase II, implement the EPP value
> > update in this function, I have concern that we will meet some potential
> Firmware or performance drop risk.
>
> I am fine to use the separated cppc_set_epp_perf.
>
> > The release schedule and merge window is closing for v6.2 and this change
> request happened after six patch review series.
> > I afraid of that we have no enough time to mitigate the risk for this new
> code change.
> > We can consider to continue optimize this in the following patch.
> >
> > Perry.
> >
> > >
> > > > + if (ret) {
> > > > + pr_debug("failed to set energy perf value (%d)\n",
> > > ret);
> > > > + return ret;
> > > > + }
> > > > + cpudata->epp_cached = epp;
> > > > + }
> > > > +
> > > > + return ret;
> > > > +}
> > >
> > > The same with above, the helpers for different cppc types of
> > > processors such as MSR or share memory should be implemented by
> static_call.
> > >
> > > > +
> > > > +static int amd_pstate_set_energy_pref_index(struct amd_cpudata
> > > *cpudata,
> > > > + int pref_index)
> > > > +{
> > > > + int epp = -EINVAL;
> > > > + int ret;
> > > > +
> > > > + if (!pref_index) {
> > > > + pr_debug("EPP pref_index is invalid\n");
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + if (epp == -EINVAL)
> > > > + epp = epp_values[pref_index];
> > > > +
> > > > + if (epp > 0 && cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
> > > {
> > > > + pr_debug("EPP cannot be set under performance policy\n");
> > > > + return -EBUSY;
> > > > + }
> > > > +
> > > > + ret = amd_pstate_set_epp(cpudata, epp);
> > > > +
> > > > + return ret;
> > > > +}
> > > > +
> > > > static inline int pstate_enable(bool enable) {
> > > > return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable); @@ -70,11
> > > +187,21
> > > > @@ static inline int pstate_enable(bool enable) static int
> > > > cppc_enable(bool enable) {
> > > > int cpu, ret = 0;
> > > > + struct cppc_perf_ctrls perf_ctrls;
> > > >
> > > > for_each_present_cpu(cpu) {
> > > > ret = cppc_set_enable(cpu, enable);
> > > > if (ret)
> > > > return ret;
> > > > +
> > > > + /* Enable autonomous mode for EPP */
> > > > + if (!cppc_active) {
> > > > + /* Set desired perf as zero to allow EPP firmware
> > > control */
> > > > + perf_ctrls.desired_perf = 0;
> > > > + ret = cppc_set_perf(cpu, &perf_ctrls);
> > > > + if (ret)
> > > > + return ret;
> > > > + }
> > > > }
> > > >
> > > > return ret;
> > > > @@ -418,7 +545,7 @@ static void amd_pstate_boost_init(struct
> > > amd_cpudata *cpudata)
> > > > return;
> > > >
> > > > cpudata->boost_supported = true;
> > > > - amd_pstate_driver.boost_enabled = true;
> > > > + default_pstate_driver->boost_enabled = true;
> > > > }
> > > >
> > > > static void amd_perf_ctl_reset(unsigned int cpu) @@ -592,10
> > > > +719,61 @@ static ssize_t show_amd_pstate_highest_perf(struct
> > > > cpufreq_policy
> > > *policy,
> > > > return sprintf(&buf[0], "%u\n", perf); }
> > > >
> > > > +static ssize_t show_energy_performance_available_preferences(
> > > > + struct cpufreq_policy *policy, char *buf) {
> > > > + int i = 0;
> > > > + int offset = 0;
> > > > +
> > > > + while (energy_perf_strings[i] != NULL)
> > > > + offset += sysfs_emit_at(buf, offset, "%s ",
> > > > +energy_perf_strings[i++]);
> > > > +
> > > > + sysfs_emit_at(buf, offset, "\n");
> > > > +
> > > > + return offset;
> > > > +}
> > > > +
> > > > +static ssize_t store_energy_performance_preference(
> > > > + struct cpufreq_policy *policy, const char *buf, size_t count) {
> > > > + struct amd_cpudata *cpudata = policy->driver_data;
> > > > + char str_preference[21];
> > > > + ssize_t ret;
> > > > +
> > > > + ret = sscanf(buf, "%20s", str_preference);
> > > > + if (ret != 1)
> > > > + return -EINVAL;
> > > > +
> > > > + ret = match_string(energy_perf_strings, -1, str_preference);
> > > > + if (ret < 0)
> > > > + return -EINVAL;
> > > > +
> > > > + mutex_lock(&amd_pstate_limits_lock);
> > > > + ret = amd_pstate_set_energy_pref_index(cpudata, ret);
> > > > + mutex_unlock(&amd_pstate_limits_lock);
> > > > +
> > > > + return ret ?: count;
> > > > +}
> > > > +
> > > > +static ssize_t show_energy_performance_preference(
> > > > + struct cpufreq_policy *policy, char *buf) {
> > > > + struct amd_cpudata *cpudata = policy->driver_data;
> > > > + int preference;
> > > > +
> > > > + preference = amd_pstate_get_energy_pref_index(cpudata);
> > > > + if (preference < 0)
> > > > + return preference;
> > > > +
> > > > + return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]);
> > > > +}
> > > > +
> > > > cpufreq_freq_attr_ro(amd_pstate_max_freq);
> > > > cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
> > > >
> > > > cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> > > > +cpufreq_freq_attr_rw(energy_performance_preference);
> > > > +cpufreq_freq_attr_ro(energy_performance_available_preferences);
> > > >
> > > > static struct freq_attr *amd_pstate_attr[] = {
> > > > &amd_pstate_max_freq,
> > > > @@ -604,6 +782,424 @@ static struct freq_attr *amd_pstate_attr[] = {
> > > > NULL,
> > > > };
> > > >
> > > > +static struct freq_attr *amd_pstate_epp_attr[] = {
> > > > + &amd_pstate_max_freq,
> > > > + &amd_pstate_lowest_nonlinear_freq,
> > > > + &amd_pstate_highest_perf,
> > > > + &energy_performance_preference,
> > > > + &energy_performance_available_preferences,
> > > > + NULL,
> > > > +};
> > > > +
> > > > +static inline void update_boost_state(void) {
> > > > + u64 misc_en;
> > > > + struct amd_cpudata *cpudata;
> > > > +
> > > > + cpudata = all_cpu_data[0];
> > > > + rdmsrl(MSR_K7_HWCR, misc_en);
> > > > + global_params.cppc_boost_disabled = misc_en & BIT_ULL(25);
> > >
> > > I won't need introduce the additional cppc_boost_disabled here. The
> > > cpufreq_driver->boost_enabled and cpudata->boost_supported can
> > > manage this function.
> >
> > The cppc_boost_disabled is used to mark if the PMFW Core boost
> > disabled, If some other driver for example thermal, performance limiting
> driver disable the core boost.
>
> I didn't see any other driver to control MSR_K7_HWCR_CPB_DIS except acpi-
> cpufreq.
>
> > We need to update the flag to let driver know the boost is disabled.
> >
> > * boost_supported is used to change CORE freq boost state.
> > * EPP driver did not use the cpufreq core boost sysfs node. So the
> boost_enabled is not used here.
>
> I would like to clarify again the core boost state is for legacy ACPI P-State, and
> it's configured by MSR_K7_HWCR. The CPPC is using the highest perf to map
> the boost frequency. However, here, it's because of some
> hardware/firmware issues or quirks that the legacy boost setting still impacts
> the target frequency. So cppc_boost will confuse the user for the
> functionalities between CPPC and ACPI P-State.

Make sense , As I know, EPP driver dose not decide the max frequency, the frequency changes are totally controlled by
the pmfw and we do not need to set the boost state here.
Power firmware will adjust frequency at runtime between the idle frequency and max frequency.
So I remove the boost state checking code in v9 to avoid confusion.

Perry.

>
> Can we enable core_boost configuration by default if using amd-pstate?
>
> >
> > >
> > > I believe it should be the firmware issue that the legacy ACPI Boost
> > > state will impact the frequency of CPPC. Could you move this
> > > handling into the cpufreq_driver->set_boost callback function to enable
> boost state by default.
> > >
> > > > +}
> > > > +
> > > > +static bool amd_pstate_acpi_pm_profile_server(void)
> > > > +{
> > > > + if (acpi_gbl_FADT.preferred_profile == PM_ENTERPRISE_SERVER ||
> > > > + acpi_gbl_FADT.preferred_profile == PM_PERFORMANCE_SERVER)
> > > > + return true;
> > > > +
> > > > + return false;
> > > > +}
> > > > +
> > > > +static int amd_pstate_init_cpu(unsigned int cpunum) {
> > > > + struct amd_cpudata *cpudata;
> > > > +
> > > > + cpudata = all_cpu_data[cpunum];
> > > > + if (!cpudata) {
> > > > + cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL);
> > > > + if (!cpudata)
> > > > + return -ENOMEM;
> > > > + WRITE_ONCE(all_cpu_data[cpunum], cpudata);
> > > > +
> > > > + cpudata->cpu = cpunum;
> > > > +
> > > > + if (cppc_active) {
> > >
> > > The cppc_active is a bit confused here, if we run into amd-pstate
> > > driver, the cppc should be active. I know you want to indicate the
> > > different driver mode you are running. Please use an enumeration
> > > type to mark it different mode such as PASSIVE_MODE, ACTIVE_MODE,
> > > and GUIDED_MODE (Wyes proposed).
> >
> > Aligned with Wyse, I add one new patch to support enumerated working
> > mode in V8
> >
> >
> > >
> > > > + if (amd_pstate_acpi_pm_profile_server())
> > > > + cppc_boost = true;
> > > > + }
> > > > +
> > > > + }
> > > > + cpudata->epp_powersave = -EINVAL;
> > > > + cpudata->epp_policy = 0;
> > > > + pr_debug("controlling: cpu %d\n", cpunum);
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static int __amd_pstate_cpu_init(struct cpufreq_policy *policy) {
> > > > + int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> > > > + struct amd_cpudata *cpudata;
> > > > + struct device *dev;
> > > > + int rc;
> > > > + u64 value;
> > > > +
> > > > + rc = amd_pstate_init_cpu(policy->cpu);
> > > > + if (rc)
> > > > + return rc;
> > > > +
> > > > + cpudata = all_cpu_data[policy->cpu];
> > > > +
> > > > + dev = get_cpu_device(policy->cpu);
> > > > + if (!dev)
> > > > + goto free_cpudata1;
> > > > +
> > > > + rc = amd_pstate_init_perf(cpudata);
> > > > + if (rc)
> > > > + goto free_cpudata1;
> > > > +
> > > > + min_freq = amd_get_min_freq(cpudata);
> > > > + max_freq = amd_get_max_freq(cpudata);
> > > > + nominal_freq = amd_get_nominal_freq(cpudata);
> > > > + lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata);
> > > > + if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) {
> > > > + dev_err(dev, "min_freq(%d) or max_freq(%d) value is
> > > incorrect\n",
> > > > + min_freq, max_freq);
> > > > + ret = -EINVAL;
> > > > + goto free_cpudata1;
> > > > + }
> > > > +
> > > > + policy->min = min_freq;
> > > > + policy->max = max_freq;
> > > > +
> > > > + policy->cpuinfo.min_freq = min_freq;
> > > > + policy->cpuinfo.max_freq = max_freq;
> > > > + /* It will be updated by governor */
> > > > + policy->cur = policy->cpuinfo.min_freq;
> > > > +
> > > > + /* Initial processor data capability frequencies */
> > > > + cpudata->max_freq = max_freq;
> > > > + cpudata->min_freq = min_freq;
> > > > + cpudata->nominal_freq = nominal_freq;
> > > > + cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
> > > > +
> > > > + policy->driver_data = cpudata;
> > > > +
> > > > + update_boost_state();
> > > > + cpudata->epp_cached = amd_pstate_get_epp(cpudata, value);
> > > > +
> > > > + policy->min = policy->cpuinfo.min_freq;
> > > > + policy->max = policy->cpuinfo.max_freq;
> > > > +
> > > > + if (boot_cpu_has(X86_FEATURE_CPPC))
> > > > + policy->fast_switch_possible = true;
> > >
> > > Please move this line into below if-case.
> >
> >
> > Fixed In V8
> >
> > >
> > > > +
> > > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > > + ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> > > &value);
> > > > + if (ret)
> > > > + return ret;
> > > > + WRITE_ONCE(cpudata->cppc_req_cached, value);
> > > > +
> > > > + ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
> > > &value);
> > > > + if (ret)
> > > > + return ret;
> > > > + WRITE_ONCE(cpudata->cppc_cap1_cached, value);
> > > > + }
> > > > + amd_pstate_boost_init(cpudata);
> > > > +
> > > > + return 0;
> > > > +
> > > > +free_cpudata1:
> > > > + kfree(cpudata);
> > > > + return ret;
> > > > +}
> > > > +
> > > > +static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) {
> > > > + int ret;
> > > > +
> > > > + ret = __amd_pstate_cpu_init(policy);
> > >
> > > I don't see any reason that we need to define a
> > > __amd_pstate_cpu_init() here. Intel P-State driver's
> > > __intel_pstate_cpu_init() is used both on intel_pstate_cpu_init and
> intel_cpufreq_cpu_init.
> >
> > Fixed in V8.
> >
> > >
> > > > + if (ret)
> > > > + return ret;
> > > > + /*
> > > > + * Set the policy to powersave to provide a valid fallback value
> > > > +in
> > > case
> > > > + * the default cpufreq governor is neither powersave nor
> > > performance.
> > > > + */
> > > > + policy->policy = CPUFREQ_POLICY_POWERSAVE;
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static int amd_pstate_epp_cpu_exit(struct cpufreq_policy *policy) {
> > > > + pr_debug("CPU %d exiting\n", policy->cpu);
> > > > + policy->fast_switch_possible = false;
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static void amd_pstate_update_max_freq(unsigned int cpu)
> > >
> > > Why do you name this function "update max frequency"?
> > >
> > > We won't have the differnt cpudata->pstate.max_freq and
> > > cpudata->pstate.turbo_freq on Intel P-State driver.
> > >
> > > I think in fact we don't update anything here.
> >
> > When core frequency was disabled, the function will update the frequency
> limits.
> > Currently the boost sysfs is not added, so the max freq is not changed.
> > Could we keep the code for the coming patch to add the sysfs node for
> boost control ?
> > It has no harm to the EPP driver.
>
> Again, the boost frequency and state is for ACPI P-State, we cannot make it
> confused in the CPPC.
>
> I didn't see where amd-pstate updates frequency max/min (limit), why we
> keep the redundant codes here?
>
> >
> > >
> > > > +{
> > > > + struct cpufreq_policy *policy = policy = cpufreq_cpu_get(cpu);
> > >
> > > struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > >
> > > > +
> > > > + if (!policy)
> > > > + return;
> > > > +
> > > > + refresh_frequency_limits(policy);
> > > > + cpufreq_cpu_put(policy);
> > > > +}
> > > > +
> > > > +static void amd_pstate_epp_update_limits(unsigned int cpu) {
> > > > + mutex_lock(&amd_pstate_driver_lock);
> > > > + update_boost_state();
> > > > + if (global_params.cppc_boost_disabled) {
> > > > + for_each_possible_cpu(cpu)
> > > > + amd_pstate_update_max_freq(cpu);
> > >
> > > This should do nothing in the amd-pstate.
> >
> > Currently the boost sysfs is not added, so the max freq is not changed.
> > Could we keep the code for the coming patch to add the sysfs node for
> boost control ?
> > It has no harm to the EPP driver.
> >
> > >
> > > > + } else {
> > > > + cpufreq_update_policy(cpu);
> > > > + }
> > > > + mutex_unlock(&amd_pstate_driver_lock);
> > > > +}
> > > > +
> > > > +static int cppc_boost_hold_time_ns = 3 * NSEC_PER_MSEC;
> > > > +
> > > > +static inline void amd_pstate_boost_up(struct amd_cpudata *cpudata)
> {
> > > > + u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
> > > > + u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
> > > > + u32 max_limit = (hwp_req & 0xff);
> > > > + u32 min_limit = (hwp_req & 0xff00) >> 8;
> > >
> > > We can use cpudata->max_perf and cpudata->min_perf directly.
> >
> > Iowait boost code removed from V8.
> >
> > >
> > > > + u32 boost_level1;
> > > > +
> > > > + /* If max and min are equal or already at max, nothing to boost
> > > > +*/
> > >
> > > I believe this is the only case that max_perf == min_perf, not at max.
> >
> > Iowait boost code removed from V8.
> >
> > >
> > > > + if (max_limit == min_limit)
> > > > + return;
> > > > +
> > > > + /* Set boost max and min to initial value */
> > > > + if (!cpudata->cppc_boost_min)
> > > > + cpudata->cppc_boost_min = min_limit;
> > > > +
> > > > + boost_level1 = ((AMD_CPPC_NOMINAL_PERF(hwp_cap) +
> > > min_limit) >> 1);
> > > > +
> > > > + if (cpudata->cppc_boost_min < boost_level1)
> > > > + cpudata->cppc_boost_min = boost_level1;
> > > > + else if (cpudata->cppc_boost_min <
> > > AMD_CPPC_NOMINAL_PERF(hwp_cap))
> > > > + cpudata->cppc_boost_min =
> > > AMD_CPPC_NOMINAL_PERF(hwp_cap);
> > > > + else if (cpudata->cppc_boost_min ==
> > > AMD_CPPC_NOMINAL_PERF(hwp_cap))
> > > > + cpudata->cppc_boost_min = max_limit;
> > > > + else
> > > > + return;
> > >
> > > Could you please elaborate the reason that you separate the min_perf
> > > (cppc_boost_min) you would like to update into cppc_req register as
> > > these different cases? Why we pick up these cases such as (min +
> > > nominal)/2, and around nominal? What's the help to optimize the
> > > final result? - I am thinking the autonomous mode is handled by SMU
> > > firmware, we need to provide some data that let us know it influences
> the final result.
> > >
> >
> > Iowait boost code removed from V8.
> >
> > > > +
> > > > + hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
> > > > + hwp_req |= AMD_CPPC_MIN_PERF(cpudata->cppc_boost_min);
> > > > + wrmsrl(MSR_AMD_CPPC_REQ, hwp_req);
> > >
> > > Do we need an update for share memory processors? In other words,
> > > epp is also supported on share memory processors. - again, we should
> > > use static call to handle the msr and cppc_acpi stuff.
> > >
> > > > + cpudata->last_update = cpudata->sample.time; }
> > > > +
> > > > +static inline void amd_pstate_boost_down(struct amd_cpudata
> > > > +*cpudata) {
> > > > + bool expired;
> > > > +
> > > > + if (cpudata->cppc_boost_min) {
> > > > + expired = time_after64(cpudata->sample.time, cpudata-
> > > >last_update +
> > > > + cppc_boost_hold_time_ns);
> > > > +
> > > > + if (expired) {
> > > > + wrmsrl(MSR_AMD_CPPC_REQ, cpudata-
> > > >cppc_req_cached);
> > > > + cpudata->cppc_boost_min = 0;
> > > > + }
> > > > + }
> > > > +
> > > > + cpudata->last_update = cpudata->sample.time; }
> > > > +
> > > > +static inline void amd_pstate_boost_update_util(struct
> > > > +amd_cpudata
> > > *cpudata,
> > > > + u64 time)
> > > > +{
> > > > + cpudata->sample.time = time;
> > > > + if (smp_processor_id() != cpudata->cpu)
> > > > + return;
> > > > +
> > > > + if (cpudata->sched_flags & SCHED_CPUFREQ_IOWAIT) {
> > > > + bool do_io = false;
> > > > +
> > > > + cpudata->sched_flags = 0;
> > > > + /*
> > > > + * Set iowait_boost flag and update time. Since IO WAIT flag
> > > > + * is set all the time, we can't just conclude that there is
> > > > + * some IO bound activity is scheduled on this CPU with just
> > > > + * one occurrence. If we receive at least two in two
> > > > + * consecutive ticks, then we treat as boost candidate.
> > > > + * This is leveraged from Intel Pstate driver.
> > >
> > > I would like to know whether we can hit this case as well? If we can
> > > find or create a use case to hit it in our platforms, I am fine to
> > > add it our driver as well. If not, I don't suggest it we add them at
> > > this moment. I hope we have verified each code path once we add them
> into the driver.
> >
> > Sure, no problem.
> > Iowait boost code removed from V8.
> >
> >
> > >
> > > > + */
> > > > + if (time_before64(time, cpudata->last_io_update + 2 *
> > > TICK_NSEC))
> > > > + do_io = true;
> > > > +
> > > > + cpudata->last_io_update = time;
> > > > +
> > > > + if (do_io)
> > > > + amd_pstate_boost_up(cpudata);
> > > > +
> > > > + } else {
> > > > + amd_pstate_boost_down(cpudata);
> > > > + }
> > > > +}
> > > > +
> > > > +static inline void amd_pstate_cppc_update_hook(struct
> > > > +update_util_data
> > > *data,
> > > > + u64 time, unsigned int flags) {
> > > > + struct amd_cpudata *cpudata = container_of(data,
> > > > + struct amd_cpudata, update_util);
> > > > +
> > > > + cpudata->sched_flags |= flags;
> > > > +
> > > > + if (smp_processor_id() == cpudata->cpu)
> > > > + amd_pstate_boost_update_util(cpudata, time); }
> > > > +
> > > > +static void amd_pstate_clear_update_util_hook(unsigned int cpu) {
> > > > + struct amd_cpudata *cpudata = all_cpu_data[cpu];
> > > > +
> > > > + if (!cpudata->update_util_set)
> > > > + return;
> > > > +
> > > > + cpufreq_remove_update_util_hook(cpu);
> > > > + cpudata->update_util_set = false;
> > > > + synchronize_rcu();
> > > > +}
> > > > +
> > > > +static void amd_pstate_set_update_util_hook(unsigned int cpu_num)
> {
> > > > + struct amd_cpudata *cpudata = all_cpu_data[cpu_num];
> > > > +
> > > > + if (!cppc_boost) {
> > > > + if (cpudata->update_util_set)
> > > > + amd_pstate_clear_update_util_hook(cpudata->cpu);
> > > > + return;
> > > > + }
> > > > +
> > > > + if (cpudata->update_util_set)
> > > > + return;
> > > > +
> > > > + cpudata->sample.time = 0;
> > > > + cpufreq_add_update_util_hook(cpu_num, &cpudata->update_util,
> > > > +
> > > amd_pstate_cppc_update_hook);
> > > > + cpudata->update_util_set = true; }
> > > > +
> > > > +static void amd_pstate_epp_init(unsigned int cpu) {
> > > > + struct amd_cpudata *cpudata = all_cpu_data[cpu];
> > > > + u32 max_perf, min_perf;
> > > > + u64 value;
> > > > + s16 epp;
> > > > +
> > > > + max_perf = READ_ONCE(cpudata->highest_perf);
> > > > + min_perf = READ_ONCE(cpudata->lowest_perf);
> > > > +
> > > > + value = READ_ONCE(cpudata->cppc_req_cached);
> > > > +
> > > > + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
> > > > + min_perf = max_perf;
> > > > +
> > > > + /* Initial min/max values for CPPC Performance Controls Register */
> > > > + value &= ~AMD_CPPC_MIN_PERF(~0L);
> > > > + value |= AMD_CPPC_MIN_PERF(min_perf);
> > > > +
> > > > + value &= ~AMD_CPPC_MAX_PERF(~0L);
> > > > + value |= AMD_CPPC_MAX_PERF(max_perf);
> > > > +
> > > > + /* CPPC EPP feature require to set zero to the desire perf bit */
> > > > + value &= ~AMD_CPPC_DES_PERF(~0L);
> > > > + value |= AMD_CPPC_DES_PERF(0);
> > > > +
> > > > + if (cpudata->epp_policy == cpudata->policy)
> > > > + goto skip_epp;
> > > > +
> > > > + cpudata->epp_policy = cpudata->policy;
> > > > +
> > > > + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> > > > + epp = amd_pstate_get_epp(cpudata, value);
> > > > + cpudata->epp_powersave = epp;
> > >
> > > I didn't see where we should have epp_powersave here. Only initial
> > > this, but it won't be used anywhere.
> >
> > epp_powersave var was removed from V8.
> >
> >
> > >
> > > > + if (epp < 0)
> > > > + goto skip_epp;
> > > > + /* force the epp value to be zero for performance policy */
> > > > + epp = 0;
> > > > + } else {
> > > > + if (cpudata->epp_powersave < 0)
> > > > + goto skip_epp;
> > > > + /* Get BIOS pre-defined epp value */
> > > > + epp = amd_pstate_get_epp(cpudata, value);
> > > > + if (epp)
> > > > + goto skip_epp;
> > > > + epp = cpudata->epp_powersave;
> > > > + }
> > > > + /* Set initial EPP value */
> > > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > > + value &= ~GENMASK_ULL(31, 24);
> > > > + value |= (u64)epp << 24;
> > > > + }
> > > > +
> > > > +skip_epp:
> > > > + WRITE_ONCE(cpudata->cppc_req_cached, value);
> > > > + amd_pstate_set_epp(cpudata, epp); }
> > > > +
> > > > +static void amd_pstate_set_max_limits(struct amd_cpudata *cpudata)
> {
> > > > + u64 hwp_cap = READ_ONCE(cpudata->cppc_cap1_cached);
> > > > + u64 hwp_req = READ_ONCE(cpudata->cppc_req_cached);
> > > > + u32 max_limit = (hwp_cap >> 24) & 0xff;
> > > > +
> > > > + hwp_req &= ~AMD_CPPC_MIN_PERF(~0L);
> > > > + hwp_req |= AMD_CPPC_MIN_PERF(max_limit);
> > > > + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, hwp_req); }
> > > > +
> > > > +static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy) {
> > > > + struct amd_cpudata *cpudata;
> > > > +
> > > > + if (!policy->cpuinfo.max_freq)
> > > > + return -ENODEV;
> > > > +
> > > > + pr_debug("set_policy: cpuinfo.max %u policy->max %u\n",
> > > > + policy->cpuinfo.max_freq, policy->max);
> > > > +
> > > > + cpudata = all_cpu_data[policy->cpu];
> > > > + cpudata->policy = policy->policy;
> > > > +
> > > > + if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > > + mutex_lock(&amd_pstate_limits_lock);
> > > > +
> > > > + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) {
> > > > + amd_pstate_clear_update_util_hook(policy->cpu);
> > > > + amd_pstate_set_max_limits(cpudata);
> > > > + } else {
> > > > + amd_pstate_set_update_util_hook(policy->cpu);
> > > > + }
> > > > +
> > > > + mutex_unlock(&amd_pstate_limits_lock);
> > > > + }
> > > > + amd_pstate_epp_init(policy->cpu);
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static void amd_pstate_verify_cpu_policy(struct amd_cpudata
> *cpudata,
> > > > + struct cpufreq_policy_data *policy)
> > > {
> > > > + update_boost_state();
> > > > + cpufreq_verify_within_cpu_limits(policy);
> > > > +}
> > > > +
> > > > +static int amd_pstate_epp_verify_policy(struct
> > > > +cpufreq_policy_data
> > > > +*policy) {
> > > > + amd_pstate_verify_cpu_policy(all_cpu_data[policy->cpu], policy);
> > > > + pr_debug("policy_max =%d, policy_min=%d\n", policy->max, policy-
> > > >min);
> > > > + return 0;
> > > > +}
> > >
> > > amd_pstate_verify_cpu_policy and amd_pstate_epp_verify_policy can
> be
> > > squeezed in one function.
> >
> > Fixed in V8.
> >
> > >
> > > > +
> > > > static struct cpufreq_driver amd_pstate_driver = {
> > > > .flags = CPUFREQ_CONST_LOOPS |
> > > CPUFREQ_NEED_UPDATE_LIMITS,
> > > > .verify = amd_pstate_verify,
> > > > @@ -617,8 +1213,20 @@ static struct cpufreq_driver
> > > > amd_pstate_driver =
> > > {
> > > > .attr = amd_pstate_attr,
> > > > };
> > > >
> > > > +static struct cpufreq_driver amd_pstate_epp_driver = {
> > > > + .flags = CPUFREQ_CONST_LOOPS,
> > > > + .verify = amd_pstate_epp_verify_policy,
> > > > + .setpolicy = amd_pstate_epp_set_policy,
> > > > + .init = amd_pstate_epp_cpu_init,
> > > > + .exit = amd_pstate_epp_cpu_exit,
> > > > + .update_limits = amd_pstate_epp_update_limits,
> > > > + .name = "amd_pstate_epp",
> > > > + .attr = amd_pstate_epp_attr,
> > > > +};
> > > > +
> > > > static int __init amd_pstate_init(void) {
> > > > + static struct amd_cpudata **cpudata;
> > > > int ret;
> > > >
> > > > if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) @@ -645,7
> > > +1253,8 @@
> > > > static int __init amd_pstate_init(void)
> > > > /* capability check */
> > > > if (boot_cpu_has(X86_FEATURE_CPPC)) {
> > > > pr_debug("AMD CPPC MSR based functionality is
> > > supported\n");
> > > > - amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
> > > > + if (!cppc_active)
> > > > + default_pstate_driver->adjust_perf =
> > > amd_pstate_adjust_perf;
> > > > } else {
> > > > pr_debug("AMD CPPC shared memory based functionality is
> > > supported\n");
> > > > static_call_update(amd_pstate_enable, cppc_enable); @@ -
> > > 653,6
> > > > +1262,10 @@ static int __init amd_pstate_init(void)
> > > > static_call_update(amd_pstate_update_perf,
> > > cppc_update_perf);
> > > > }
> > > >
> > > > + cpudata = vzalloc(array_size(sizeof(void *), num_possible_cpus()));
> > > > + if (!cpudata)
> > > > + return -ENOMEM;
> > > > + WRITE_ONCE(all_cpu_data, cpudata);
> > >
> > > Why we cannot use cpufreq_policy->driver_data to store the cpudata?
> > > I believe the cpudata is per-CPU can be easily get from private data.
> >
> > cpufreq_policy->driver_data can do that, but global data can have
> > better cached hit rate especially for the server cluster.
> > So I would prefer to use the global cpudata to store each CPU data struct.
> > Could we keep it for EPP driver ?
>
> I didn't get your meaning what's "better cached hit rate especially for the
> server cluster"? In my view, there is just different design no performance
> enhancement. Since we already implemented in cpufreq_policy-
> >driver_data, I didn't see any visiblely improvement to change into this way.
>
> Thanks,
> Ray