Re: [PATCH v2 3/4] powercap: Add AMD Fam17h RAPL support

From: Victor Ding
Date: Mon Oct 12 2020 - 08:04:48 EST


On Fri, Oct 9, 2020 at 2:47 PM Zhang Rui <rui.zhang@xxxxxxxxx> wrote:
>
> On Wed, 2020-10-07 at 11:14 -0500, Kim Phillips wrote:
> > From: Victor Ding <victording@xxxxxxxxxx>
> >
> > This patch enables AMD Fam17h RAPL support for the power capping
> > framework. The support is as per AMD Fam17h Model31h (Zen2) and
> > model 00-ffh (Zen1) PPR.
> >
> > Tested by comparing the results of following two sysfs entries and
> > the
> > values directly read from corresponding MSRs via /dev/cpu/[x]/msr:
> > /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
> > /sys/class/powercap/intel-rapl/intel-rapl:0/intel-
> > rapl:0:0/energy_uj
> >
> > Signed-off-by: Victor Ding <victording@xxxxxxxxxx>
> > Acked-by: Kim Phillips <kim.phillips@xxxxxxx>
> > Cc: Victor Ding <victording@xxxxxxxxxx>
> > Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
> > Cc: Borislav Petkov <bp@xxxxxxxxx>
> > Cc: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
> > Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> > Cc: Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx>
> > Cc: "Peter Zijlstra (Intel)" <peterz@xxxxxxxxxxxxx>
> > Cc: "Rafael J. Wysocki" <rjw@xxxxxxxxxxxxx>
> > Cc: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Cc: Tony Luck <tony.luck@xxxxxxxxx>
> > Cc: Vineela Tummalapalli <vineela.tummalapalli@xxxxxxxxx>
> > Cc: LKML <linux-kernel@xxxxxxxxxxxxxxx>
> > Cc: linux-pm@xxxxxxxxxxxxxxx
> > Cc: x86@xxxxxxxxxx
> > ---
> > Kim's changes from Victor's original submission:
> >
> >
> https://lore.kernel.org/lkml/20200729205144.3.I01b89fb23d7498521c84cfdf417450cbbfca46bb@changeid/
> >
> > - Added my Acked-by.
> > - Added Daniel Lezcano to Cc.
> >
> > arch/x86/include/asm/msr-index.h | 1 +
> > drivers/powercap/intel_rapl_common.c | 2 ++
> > drivers/powercap/intel_rapl_msr.c | 27
> > ++++++++++++++++++++++++++-
> > 3 files changed, 29 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/msr-index.h
> > b/arch/x86/include/asm/msr-index.h
> > index f1b24f1b774d..c0646f69d2a5 100644
> > --- a/arch/x86/include/asm/msr-index.h
> > +++ b/arch/x86/include/asm/msr-index.h
> > @@ -324,6 +324,7 @@
> > #define MSR_PP1_POLICY 0x00000642
> >
> > #define MSR_AMD_RAPL_POWER_UNIT 0xc0010299
> > +#define MSR_AMD_CORE_ENERGY_STATUS 0xc001029a
> > #define MSR_AMD_PKG_ENERGY_STATUS 0xc001029b
> >
> > /* Config TDP MSRs */
> > diff --git a/drivers/powercap/intel_rapl_common.c
> > b/drivers/powercap/intel_rapl_common.c
> > index 983d75bd5bd1..6905ccffcec3 100644
> > --- a/drivers/powercap/intel_rapl_common.c
> > +++ b/drivers/powercap/intel_rapl_common.c
> > @@ -1054,6 +1054,8 @@ static const struct x86_cpu_id rapl_ids[]
> > __initconst = {
> >
> > X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL, &rapl_defaults_hsw_se
> > rver),
> > X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM, &rapl_defaults_hsw_se
> > rver),
> > +
> > + X86_MATCH_VENDOR_FAM(AMD, 0x17, &rapl_defaults_core),
>
> I double if we can use rapl_defaults_core here.
>
> static const struct rapl_defaults rapl_defaults_core = {
> .floor_freq_reg_addr = 0,
> .check_unit = rapl_check_unit_core,
> .set_floor_freq = set_floor_freq_default,
> .compute_time_window = rapl_compute_time_window_core,
> };
>
> .floor_freq_reg_addr = 0,
> is redundant here, even for rapl_defaults_core, we can remove it.
>
> .check_unit = rapl_check_unit_core,
> the Intel UNIT MSR supports three units including Energy/Power/Time.
> From the change below, only the energy counter is supported, so you may
> need to confirm if all the three units are supported or not.
>
> .set_floor_freq = set_floor_freq_default,this function sets PL1_CLAMP bit on RAPL_DOMAIN_REG_LIMIT, but RAPL_DOMAIN_REG_LIMIT is not supported on the AMD cpus.
>
> .compute_time_window = rapl_compute_time_window_core,
> this is used for setting the power limits, which is not supported on
> the AMD cpus.
>
> IMO, it's better to use a new rapl_defaults that contains valid
> callbacks for AMD cpus.
Good point. The only reason why I proposed to re-use rapl_defaults_core was
that "check_unit" is the only function needed here, and it is the same
as Intel's.
The rest of callbacks are not used at all since setting the power limits is not
supported on AMD CPUs. Let's create a new callback for AMD. It should be like:
static const struct rapl_defaults rapl_defaults_amd = {
.check_unit = rapl_check_unit_core,
};
The AMD UNIT MSR behaves the same as Intel's: [3:0] is Power Units, [12:8] is
Energy Status Units, and [19:16] is Time Units.
>
> > {}
> > };
> > MODULE_DEVICE_TABLE(x86cpu, rapl_ids);
> > diff --git a/drivers/powercap/intel_rapl_msr.c
> > b/drivers/powercap/intel_rapl_msr.c
> > index c68ef5e4e1c4..dcaef917f79d 100644
> > --- a/drivers/powercap/intel_rapl_msr.c
> > +++ b/drivers/powercap/intel_rapl_msr.c
> > @@ -48,6 +48,21 @@ static struct rapl_if_priv rapl_msr_priv_intel = {
> > .limits[RAPL_DOMAIN_PACKAGE] = 2,
> > };
> >
> > +static struct rapl_if_priv rapl_msr_priv_amd = {
> > + .reg_unit = MSR_AMD_RAPL_POWER_UNIT,
> > + .regs[RAPL_DOMAIN_PACKAGE] = {
> > + 0, MSR_AMD_PKG_ENERGY_STATUS, 0, 0, 0 },
> > + .regs[RAPL_DOMAIN_PP0] = {
> > + 0, MSR_AMD_CORE_ENERGY_STATUS, 0, 0, 0 },
> > + .regs[RAPL_DOMAIN_PP1] = {
> > + 0, 0, 0, 0, 0 },
> > + .regs[RAPL_DOMAIN_DRAM] = {
> > + 0, 0, 0, 0, 0 },
> > + .regs[RAPL_DOMAIN_PLATFORM] = {
> > + 0, 0, 0, 0, 0},
>
> I don't think you need to set the PP1/DRAM/PLATFORM registers to 0 explicitly if they are not supported.
Good suggestion. Let's remove the zeros.
>
> > + .limits[RAPL_DOMAIN_PACKAGE] = 1,
>
>
> Is Pkg Domain PL1 really supported?
> At least according to this patch, I don't think so. So if power limit
> is supported, it is better to add the support in this patch altogether.
>
> If no, we're actually exposing energy counters only. If this is the
> case, I'm not sure if it is proper to do this in powercap class because
> we can not do powercap actually. Or at least, if we want to support
> power zones with no power limits, we should enhance the code to
> support this rather than fake a power limit.
>
Correct, this is solely to expose energy counters. Many existing tools
are built on
top of powercap's sysfs to query energy counters, even though they don't set the
power limit. Exposing the energy counters through the same powercap interface
allows these tools built for Intel run seamlessly on AMD. Hence, I
believe powercap
is the best place to expose AMD's energy counters.

I like your idea of enhancing the code to avoid using a fake power
limit. I'll make
the update.
> thanks,
> rui
> > +};
> > +
> > /* Handles CPU hotplug on multi-socket systems.
> > * If a CPU goes online as the first CPU of the physical package
> > * we add the RAPL package to the system. Similarly, when the last
> > @@ -137,7 +152,17 @@ static int rapl_msr_probe(struct platform_device
> > *pdev)
> > const struct x86_cpu_id *id = x86_match_cpu(pl4_support_ids);
> > int ret;
> >
> > - rapl_msr_priv = &rapl_msr_priv_intel;
> > + switch (boot_cpu_data.x86_vendor) {
> > + case X86_VENDOR_INTEL:
> > + rapl_msr_priv = &rapl_msr_priv_intel;
> > + break;
> > + case X86_VENDOR_AMD:
> > + rapl_msr_priv = &rapl_msr_priv_amd;
> > + break;
> > + default:
> > + pr_err("intel-rapl does not support CPU vendor %d\n",
> > boot_cpu_data.x86_vendor);
> > + return -ENODEV;
> > + }
> > rapl_msr_priv->read_raw = rapl_msr_read_raw;
> > rapl_msr_priv->write_raw = rapl_msr_write_raw;
> >
>
>
>
> IF
>
>
>
Best regards,
Victor Ding