Re: [PATCH v4 04/39] x86/resctrl: Use schema type to determine how to parse schema values

From: James Morse
Date: Fri Sep 06 2024 - 14:06:38 EST


Hi Reinette,

On 14/08/2024 04:56, Reinette Chatre wrote:
> On 8/2/24 10:28 AM, James Morse wrote:
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 9ca542a8e2d4..57c88e1c2adf 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c

>> @@ -192,6 +191,18 @@ enum resctrl_scope {
>>       RESCTRL_L3_NODE,
>>   };
>>   +/**
>> + * enum resctrl_schema_fmt - The format user-space provides for a schema.
>> + * @RESCTRL_SCHEMA_BITMAP:    The schema is a bitmap in hex.
>> + * @RESCTRL_SCHEMA_PERCENTAGE:    The schema is a decimal percentage value.
>> + * @RESCTRL_SCHEMA_MBPS:    The schema is a decimal MBps value.
>> + */
>> +enum resctrl_schema_fmt {
>> +    RESCTRL_SCHEMA_BITMAP,
>> +    RESCTRL_SCHEMA_PERCENTAGE,
>> +    RESCTRL_SCHEMA_MBPS,
>> +};
>> +
>
> I believe that the choice of RESCTRL_SCHEMA_PERCENTAGE and RESCTRL_SCHEMA_MBPS has
> potential for significant confusion. The closest place to where user space can enter
> a MBps value (which is actually MiBps) is on Intel when resctrl is mounted with mba_MBps,
> and as per above it will have the "RESCTRL_SCHEMA_PERCENTAGE" format.

It was AMD's QOS that I was thinking of. To pick the example from the documentation:
| For example, to allocate 2GB/s limit on the first cache id:
| MB:0=2048;1=2048;2=2048;3=2048

How does user-space know that its not a percentage? Suck it and see...
If those values aren't in MB/s ... I'm not sure what they are.


The schema format isn't exposed to user-space. (I think we should do that).
I agree if it were, we'd need to report MBps/MiBps for the mba_sc case.


> What is considered
> here as RESCTRL_SCHEMA_MBPS also cannot really be considered as "MBPS" since it is used to
> cover AMD's values that are "multiples of one eighth GB/s". Any new resource that
> _actually_ uses MBPS will thus not be able to use RESCTRL_SCHEMA_MBPS.

Isn't this already covered by the granularity?
AMD platforms are unfortunately reporting '1' as the granularity.
>From the 1/8th GB/s it sounds like the granularity should really be 128 MB/s.

Any platform that did have a real MB/s control could report whatever granularity it really
has here.


> Considering that RESCTRL_SCHEMA_PERCENTAGE and RESCTRL_SCHEMA_MBPS use the same parser,
> could "RESCTRL_SCHEMA_RANGE" be more fitting? I acknowledge that it is very generic and
> better
> ideas are welcome. A "range" does seem to be appropriate considering the later patch
> (x86/resctrl:

> Add max_bw to struct resctrl_membw) that codes an explicit max.

Yes because the AMD platforms have a firmware-advertised maximum value, I presume that is
an achievable bandwidth. (otherwise why advertise it to user-space!).
(I'm not sure why data_width is 4 - it looks like that was the hard coded based on one
platforms limit ... that probably needs fixing too)

But if you had a MBps control, I'm not sure that implies you know what the maximum
achievable bandwidth is. The control may have a range based on some power of two, which is
unrelated to the bandwidth.
In the arm world the hardware never knows what the achievable bandwidth is - that is
something only the SoC designer can work out.


I agree we can parse both values as a range, as resctrl doesn't need to interpret the
values, its just whatever the hardware does.

I did it like this as I'd like to expose "percentage/MBps/bitmap" to user-space so that
control types user-space doesn't recognised can be programmed. If arm64 and riscv each
want to add new schema types, it would be good to do it in a way that is low-impact for
the resctrl filesystem code, and unaware user-space can do something with.
percentage/MBps/bitmap are the control schemes we already have.

The MBps/range difference matters if the 'maximum bandwidth' isn't actually an achievable
number e.g. ~0, then user-space can't treat it as some kind of percentage.


Thanks,

James