Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD

From: Reinette Chatre
Date: Tue Jun 02 2020 - 19:28:22 EST


Hi Babu,

On 6/2/2020 3:12 PM, Babu Moger wrote:
>
>
>> -----Original Message-----
>> From: Reinette Chatre <reinette.chatre@xxxxxxxxx>
>> Sent: Tuesday, June 2, 2020 4:51 PM
>> To: Moger, Babu <Babu.Moger@xxxxxxx>; fenghua.yu@xxxxxxxxx;
>> tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; bp@xxxxxxxxx; x86@xxxxxxxxxx;
>> hpa@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
>> Subject: Re: [PATCH] x86/resctrl: Fix memory bandwidth counter width for AMD
>>
>> Hi Babu,
>>
>> On 6/1/2020 4:00 PM, Babu Moger wrote:
>>> Memory bandwidth is calculated reading the monitoring counter
>>> at two intervals and calculating the delta. It is the softwareâs
>>> responsibility to read the count often enough to avoid having
>>> the count roll over _twice_ between reads.
>>>
>>> The current code hardcodes the bandwidth monitoring counter's width
>>> to 24 bits for AMD. This is due to default base counter width which
>>> is 24. Currently, AMD does not implement the CPUID 0xF.[ECX=1]:EAX
>>> to adjust the counter width. But, the AMD hardware supports much
>>> wider bandwidth counter with the default width of 44 bits.
>>>
>>> Kernel reads these monitoring counters every 1 second and adjusts the
>>> counter value for overflow. With 24 bits and scale value of 64 for AMD,
>>> it can only measure up to 1GB/s without overflowing. For the rates
>>> above 1GB/s this will fail to measure the bandwidth.
>>>
>>> Fix the issue setting the default width to 44 bits by adjusting the
>>> offset.
>>>
>>> AMD future products will implement the CPUID 0xF.[ECX=1]:EAX.
>>>
>>> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
>>> ---
>>> - Sending it second time. Email client had some issues first time.
>>> - Generated the patch on top of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git (x86/cache).
>>>
>>> arch/x86/kernel/cpu/resctrl/core.c | 8 +++++++-
>>> arch/x86/kernel/cpu/resctrl/internal.h | 1 +
>>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
>> b/arch/x86/kernel/cpu/resctrl/core.c
>>> index 12f967c6b603..6040e9ae541b 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>> @@ -983,7 +983,13 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
>>> c->x86_cache_occ_scale = ebx;
>>> if (c->x86_vendor == X86_VENDOR_INTEL)
>>> c->x86_cache_mbm_width_offset = eax & 0xff;
>>> - else
>>> + else if (c->x86_vendor == X86_VENDOR_AMD) {
>>> + if (eax)
>>
>> This test checks if _any_ bit is set in eax ...
>>
>>> + c->x86_cache_mbm_width_offset = eax & 0xff;
>>
>> ... with the assumption that the first eight bits contain a value.
>>
>> Even so, now that Intel and AMD will be using eax in the same way,
>> perhaps it can be done simpler by always using eax to obtain the offset
>> (and thus avoid the code duplication) and on AMD initialize the default
>> if it cannot be obtained from eax?
>>
>> What I mean is something like:
>>
>> @@ -1024,10 +1024,12 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
>>
>> c->x86_cache_max_rmid = ecx;
>> c->x86_cache_occ_scale = ebx;
>> - if (c->x86_vendor == X86_VENDOR_INTEL)
>> - c->x86_cache_mbm_width_offset = eax & 0xff;
>> - else
>> - c->x86_cache_mbm_width_offset = -1;
>> + c->x86_cache_mbm_width_offset = eax & 0xff;
>> + if (c->x86_vendor == X86_VENDOR_AMD &&
>> + c->x86_cache_mbm_width_offset == 0) {
>> + c->x86_cache_mbm_width_offset =
>> + MBM_CNTR_WIDTH_OFFSET_AMD;
>> + }
>> }
>> }
>>
>> What do you think?
>
> That looks good. But we still need to keep the
> default(c->x86_cache_mbm_width_offset = -1;) for non-AMD and non-Intel.
> How about this?

This original default of -1 was added to deal with AMD when it was not
known to support eax. Now that AMD's support of eax is captured among
the default code I did not find it necessary to keep that considering
resctrl_cpu_detect() is only called on AMD and Intel.

>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> b/arch/x86/kernel/cpu/resctrl/core.c
> index 12f967c6b603..7269bd896ba9 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -983,6 +983,9 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
> c->x86_cache_occ_scale = ebx;
> if (c->x86_vendor == X86_VENDOR_INTEL)
> c->x86_cache_mbm_width_offset = eax & 0xff;
> + else if (c->x86_vendor == X86_VENDOR_AMD)
> + c->x86_cache_mbm_width_offset = eax ? eax & 0xff :

This has the same concern that I mentioned earlier where the contents of
the entire register is used to determine if the first eight bits
contains a value. Did I miss something obvious?

> +
> MBM_CNTR_WIDTH_OFFSET_AMD;
> else
> c->x86_cache_mbm_width_offset = -1;
> }
>

Reinette