RE: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD
From: Babu Moger
Date: Wed Jun 03 2020 - 11:04:23 EST
Hi Reinette,
> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> Sent: Tuesday, June 2, 2020 6:28 PM
> To: Moger, Babu <Babu.Moger@xxxxxxx>; fenghua.yu@xxxxxxxxx;
> tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; bp@xxxxxxxxx; x86@xxxxxxxxxx;
> hpa@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD
>
> Hi Babu,
>
> On 6/2/2020 3:12 PM, Babu Moger wrote:
> >
> >
> >> -----Original Message-----
> >> From: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> >> Sent: Tuesday, June 2, 2020 4:51 PM
> >> To: Moger, Babu <Babu.Moger@xxxxxxx>; fenghua.yu@xxxxxxxxx;
> >> tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; bp@xxxxxxxxx; x86@xxxxxxxxxx;
> >> hpa@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> >> Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for
> AMD
> >>
> >> Hi Babu,
> >>
> >> On 6/1/2020 4:00 PM, Babu Moger wrote:
> >>> Memory bandwidth is calculated reading the monitoring counter
> >>> at two intervals and calculating the delta. It is the softwareâs
> >>> responsibility to read the count often enough to avoid having
> >>> the count roll over _twice_ between reads.
> >>>
> >>> The current code hardcodes the bandwidth monitoring counter's width
> >>> to 24 bits for AMD. This is due to default base counter width which
> >>> is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
> >>> to adjust the counter width. But, the AMD hardware supports much
> >>> wider bandwidth counter with the default width of 44 bits.
> >>>
> >>> Kernel reads these monitoring counters every 1 second and adjusts the
> >>> counter value for overflow. With 24 bits and scale value of 64 for AMD,
> >>> it can only measure up to 1GB/s without overflowing. For the rates
> >>> above 1GB/s this will fail to measure the bandwidth.
> >>>
> >>> Fix the issue setting the default width to 44 bits by adjusting the
> >>> offset.
> >>>
> >>> AMD future products will implement the CPUID 0xF.[ECX=1]:EAX.
> >>>
> >>> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
> >>> ---
> >>> - Sending it second time. Email client had some issues first time.
> >>> - Generated the patch on top of
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git (x86/cache).
> >>>
> >>> arch/x86/kernel/cpu/resctrl/core.c | 8 +++++++-
> >>> arch/x86/kernel/cpu/resctrl/internal.h | 1 +
> >>> 2 files changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> >> b/arch/x86/kernel/cpu/resctrl/core.c
> >>> index 12f967c6b603..6040e9ae541b 100644
> >>> --- a/arch/x86/kernel/cpu/resctrl/core.c
> >>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> >>> @@ -983,7 +983,13 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
> >>> c->x86_cache_occ_scale = ebx;
> >>> if (c->x86_vendor == X86_VENDOR_INTEL)
> >>> c->x86_cache_mbm_width_offset = eax & 0xff;
> >>> - else
> >>> + else if (c->x86_vendor == X86_VENDOR_AMD) {
> >>> + if (eax)
> >>
> >> This test checks if _any_ bit is set in eax ...
> >>
> >>> + c->x86_cache_mbm_width_offset = eax & 0xff;
> >>
> >> ... with the assumption that the first eight bits contain a value.
> >>
> >> Even so, now that Intel and AMD will be using eax in the same way,
> >> perhaps it can be done simpler by always using eax to obtain the offset
> >> (and thus avoid the code duplication) and on AMD initialize the default
> >> if it cannot be obtained from eax?
> >>
> >> What I mean is something like:
> >>
> >> @@ -1024,10 +1024,12 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
> >>
> >> c->x86_cache_max_rmid = ecx;
> >> c->x86_cache_occ_scale = ebx;
> >> - if (c->x86_vendor == X86_VENDOR_INTEL)
> >> - c->x86_cache_mbm_width_offset = eax & 0xff;
> >> - else
> >> - c->x86_cache_mbm_width_offset = -1;
> >> + c->x86_cache_mbm_width_offset = eax & 0xff;
> >> + if (c->x86_vendor == X86_VENDOR_AMD &&
> >> + c->x86_cache_mbm_width_offset == 0) {
> >> + c->x86_cache_mbm_width_offset =
> >> + MBM_CNTR_WIDTH_OFFSET_AMD;
> >> + }
> >> }
> >> }
> >>
> >> What do you think?
> >
> > That looks good. But we still need to keep the
> > default(c->x86_cache_mbm_width_offset = -1;) for non-AMD and non-Intel.
> > How about this?
>
> This original default of -1 was added to deal with AMD when it was not
> known to support eax. Now that AMD's support of eax is captured among
> the default code I did not find it necessary to keep that considering
> resctrl_cpu_detect() is only called on AMD and Intel.
Ok. Sure. Will re-post with changes.
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> > b/arch/x86/kernel/cpu/resctrl/core.c
> > index 12f967c6b603..7269bd896ba9 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -983,6 +983,9 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
> > c->x86_cache_occ_scale = ebx;
> > if (c->x86_vendor == X86_VENDOR_INTEL)
> > c->x86_cache_mbm_width_offset = eax & 0xff;
> > + else if (c->x86_vendor == X86_VENDOR_AMD)
> > + c->x86_cache_mbm_width_offset = eax ? eax & 0xff :
>
> This has the same concern that I mentioned earlier where the contents of
> the entire register is used to determine if the first eight bits
> contains a value. Did I miss something obvious?
You are right. I will make the change as you suggested. Thanks
>
> > +
> > MBM_CNTR_WIDTH_OFFSET_AMD;
> > else
> > c->x86_cache_mbm_width_offset = -1;
> > }
> >
>
> Reinette