RE: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority partitioning control

From: Amit Singh Tomar
Date: Wed Aug 23 2023 - 17:34:44 EST


Hi Reinette,

(Kindly follow the responses in a top-to-bottom sequence).

-----Original Message-----
From: Reinette Chatre <reinette.chatre@xxxxxxxxx>
Sent: Thursday, August 24, 2023 12:37 AM
To: Amit Singh Tomar <amitsinght@xxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
Cc: fenghua.yu@xxxxxxxxx; james.morse@xxxxxxx; George Cherian <gcherian@xxxxxxxxxxx>; robh@xxxxxxxxxx; peternewman@xxxxxxxxxx; Luck, Tony <tony.luck@xxxxxxxxx>
Subject: Re: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority partitioning control

Hi Amit,

On 8/22/2023 5:44 AM, Amit Singh Tomar wrote:
> Hi Reinette,
>
> Thanks for having a look!
>
> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> Sent: Friday, August 18, 2023 12:41 AM
> To: Amit Singh Tomar <amitsinght@xxxxxxxxxxx>;
> linux-kernel@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Cc: fenghua.yu@xxxxxxxxx; james.morse@xxxxxxx; George Cherian
> <gcherian@xxxxxxxxxxx>; robh@xxxxxxxxxx; peternewman@xxxxxxxxxx; Luck,
> Tony <tony.luck@xxxxxxxxx>
> Subject: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority
> partitioning control
>
> External Email
>
> ----------------------------------------------------------------------
> (+Tony)
>
> Hi Amit,
>
> On 8/15/2023 8:27 AM, Amit Singh Tomar wrote:
>> Arm Memory System Resource Partitioning and Monitoring (MPAM)
>> supports different controls that can be applied to different
>> resources in the system For instance, an optional priority
>> partitioning control where priority value is generated from one MSC,
>> propagates over interconnect to other MSC (known as downstream
>> priority), or can be applied within an MSC for internal operations.
>>
>> Marvell implementation of ARM MPAM supports priority partitioning
>> control that allows LLC MSC to generate priority values that gets
>> propagated (along with read/write request from upstream) to DDR Block.
>> Within the DDR block the priority values is mapped to different traffic class under DDR QoS strategy.
>> The link[1] gives some idea about DDR QoS strategy, and terms like
>> LPR, VPR and HPR.
>>
>> Setup priority partitioning control under Resource control
>> ----------------------------------------------------------
>> At present, resource control (resctrl) provides basic interface to
>> configure/set-up CAT (Cache Allocation Technology) and MBA (Memory Bandwidth Allocation) capabilities.
>> ARM MPAM uses it to support controls like Cache portion partition
>> (CPOR), and MPAM bandwidth partitioning.
>>
>> As an example, "schemata" file under resource control group contains
>> information about cache portion bitmaps, and memory bandwidth
>> allocation, and these are used to configure Cache portion partition (CPOR), and MPAM bandwidth partitioning controls.
>>
>> MB:0=0100
>> L3:0=ffff
>>
>> But resctrl doesn't provide a way to set-up other control that ARM
>> MPAM provides (For instance, Priority partitioning control as
>> mentioned above). To support this, James has suggested to use already
>> existing schemata to be compatible with portable software, and this
>> is the main idea behind this RFC is to have some kind of discussion on how resctrl can be extended to support priority partitioning control.
>>
>> To support Priority partitioning control, "schemata" file is updated
>> to accommodate priority field (upon priority partitioning capability
>> detection), separated from CPBM using delimiter ",".
>>
>> L3:0=ffff,f where f indicates downstream priority max value.
>>
>> These dspri value gets programmed per partition, that can be used to
>> override QoS value coming from upstream (CPU).
>>
>> RFC patch-set[2] is based on James Morse's MPAM snapshot[3] for 6.2,
>> and ACPI table is based on DEN0065A_MPAM_ACPI_2.0.
>>
>
> There are some aspects of this that I think we should be cautious
> about. First, there may inevitably be more properties in the future
> that need to be associated with a resource allocation, these may
> indeed be different between architectures and individual platforms.
> Second, user space need a way to know which properties are supported
> and what valid parameters may be.
>
> On a high level I thus understand the goal be to add support for
> assigning a property to a resource allocation with "Priority
> partitioning control" being the first property.

> To that end, I have a few questions:
> * How can this interface be expanded to support more properties with the
> expectation that a system/architecture may not support all resctrl supported
> properties?
> [>>] All these new controls ("Priority partitioning is one of them) detected as resource capabilities (via Features Identification Register), and these control will not be probed, if system/architecture
> doesn't support it. From resource control side, this means that users will never get to know about the controls from schemata file. For instance, the platform that supports Priority partitioning control
> schemata file looks like:
>
> # cat schemata
> L3:1=ffff
>
> As oppose to when system has Priority partitioning control
> # cat schemata
> L3:1=ffff,f
>

Right, but my question is "How can this interface be expanded ...".
Consider a future L3 resource that has a new and different property
("new_property") that is independent from "Priority partitioning".
If "L3:1=ffff,f" means "Priority partitioning" == 0xf, how can a value be assigned to "new_property" if the system's L3 supports it but not "Priority partitioning"?
If I understand correctly the proposed interface is a positional interface and "Priority partitioning" is always in second field ...

[>>] Yes, "Priority partitioning" will always be the second field.

but a system may or may not support this property so does it require an empty second field to be able to use other properties?

[>>] Yes, in the absence of this control ("Priority partitioning"), second field will be taken by other control (if supported).

So, for example, if L3 resource is equipped with two controls, .i.e. CPOR and PPART, schemata will look like:

L3:0=XXXX,PPART=X

and, if same resource is equipped with another set of controls, .i.e. CPOR and CCAP (cache capacity partitioning), schemata will look like:

L3:0=XXXX,CCAP=X

and, in case resource is equipped with all three controls, schemata will look like:

L3:0=XXXX,PPART=X,CCAP=X

Each of these combinations, features its own format specifier.

>
> * Is it possible for support for properties to vary between, for example, different
> MSCs in the system? From resctrl side it may mean that there would be a resource,
> for example "L3", with multiple instances, for example, cache with id #0, cache
> with id#1, etc. but the supported properties or valid values of properties
> may vary between the instances?
> [>>] This is really implementation dependent but we would expect, if multiple L3 instances
> across multiple dies implements this control, it should be uniform across, but let's take a case
> where L3 MSC instance on one socket has this control, and other L3 MSC instance on another
> socket doesn't have this control. From resctrl perspective, one would see this control
> only for L3 instance that has this control, and programmed only for that L3 instance.
>
> L3:0=XXXX,X;L3:1=XXXX
>
> And as per proposed format:
>
> L3:0=XXXX,PPART=X, L3:1=XXXX

I'm a bit lost ... what proposed format?
[>>] Sorry about that, I should have indicated the proposed format is in the point below.

>
> * How can user space know that a system supports "Priority partitioning control"?
> User space needs to know when/if it can attempt to write a priority to the
> schemata.
> [>>] At the moment, we label only the resource class, and would like to propose we should
> label newly added controls (under a resource class) as well so that user can easily identify
> which control to program. For instance, the schemata file with this proposed changes
> will look like this:
>
> L3:0=XXXX,PPART=X
>
> where PPART=Priority partitioning control, Similarly, if L3 resource class has one more capability, say cache capacity partitioning.
>
> L3:0=XXXX,PPART=X,CCAP=X
>
> Very first control always be CAT/CPOR (with no labels)
>

Is your response intended to be read from bottom to top?

> * How can user space know what priority values are valid for a particular system?
> [>>] Supported priority values are read from one of the MPAM Priority Partitioning register, and in the
> Schemata file, it is set to Maximum value just like Cache portion bitmaps or Memory bandwidth allocation.
> For instance:
>
> L3:0=ffff,f, max priority values is f, and user can
> program/set from 0-15

Doing so would require user space to (a) be running from the time resctrl is mounted, and (b) maintain state about all resctrl resources, properties, and supported values.

I think that this is risky and places a burden on user space that in some scenarios would be impossible to achieve. Consider the scenario when user space starts running after resctrl has been in use for a while or if user space loses its state. The info directory is where information about enabled resources are located.

[>>] Thanks for point it out, will export this information to info directory.

>
>
>> Test set-up and results:
>> ------------------------
>>
>> The downstream priority value feeds into DRAM controller, and one of
>> the important thing that it does with this value is to service the
>> requests sooner (based on the traffic class), hence reducing latency without affecting performance.
>
> Could you please elaborate here? I expected reduced latency to have a big impact on performance.
> [>>] To be clear, by performance, it meant Memory bandwidth, and with this specific configuration/test
> We see priority partitioning as a utility to guarantee lower latency. We are yet to explore its affect
> On memory bandwidth side.

Please be careful about claims because the above sounds to me as though this work claims to not affect memory bandwidth but it is also states that the impact on memory bandwidth has not yet been explored.
[>>] Sure, will be more careful with my wording but the previous statement "hence reducing latency without affecting performance" is based on
test result we presented. For instance, if we look at Bandwidth numbers across the priority values, it's almost the same ~345 GB/s.

Thanks
-Amit