RE: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority partitioning control

From: Amit Singh Tomar
Date: Tue Aug 22 2023 - 08:44:41 EST


Hi Reinette,

Thanks for having a look!

-----Original Message-----
From: Reinette Chatre <reinette.chatre@xxxxxxxxx>
Sent: Friday, August 18, 2023 12:41 AM
To: Amit Singh Tomar <amitsinght@xxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
Cc: fenghua.yu@xxxxxxxxx; james.morse@xxxxxxx; George Cherian <gcherian@xxxxxxxxxxx>; robh@xxxxxxxxxx; peternewman@xxxxxxxxxx; Luck, Tony <tony.luck@xxxxxxxxx>
Subject: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority partitioning control

External Email

----------------------------------------------------------------------
(+Tony)

Hi Amit,

On 8/15/2023 8:27 AM, Amit Singh Tomar wrote:
> Arm Memory System Resource Partitioning and Monitoring (MPAM) supports
> different controls that can be applied to different resources in the
> system For instance, an optional priority partitioning control where
> priority value is generated from one MSC, propagates over interconnect
> to other MSC (known as downstream priority), or can be applied within
> an MSC for internal operations.
>
> Marvell implementation of ARM MPAM supports priority partitioning
> control that allows LLC MSC to generate priority values that gets
> propagated (along with read/write request from upstream) to DDR Block.
> Within the DDR block the priority values is mapped to different traffic class under DDR QoS strategy.
> The link[1] gives some idea about DDR QoS strategy, and terms like
> LPR, VPR and HPR.
>
> Setup priority partitioning control under Resource control
> ----------------------------------------------------------
> At present, resource control (resctrl) provides basic interface to
> configure/set-up CAT (Cache Allocation Technology) and MBA (Memory Bandwidth Allocation) capabilities.
> ARM MPAM uses it to support controls like Cache portion partition
> (CPOR), and MPAM bandwidth partitioning.
>
> As an example, "schemata" file under resource control group contains
> information about cache portion bitmaps, and memory bandwidth
> allocation, and these are used to configure Cache portion partition (CPOR), and MPAM bandwidth partitioning controls.
>
> MB:0=0100
> L3:0=ffff
>
> But resctrl doesn't provide a way to set-up other control that ARM
> MPAM provides (For instance, Priority partitioning control as
> mentioned above). To support this, James has suggested to use already
> existing schemata to be compatible with portable software, and this is
> the main idea behind this RFC is to have some kind of discussion on how resctrl can be extended to support priority partitioning control.
>
> To support Priority partitioning control, "schemata" file is updated
> to accommodate priority field (upon priority partitioning capability
> detection), separated from CPBM using delimiter ",".
>
> L3:0=ffff,f where f indicates downstream priority max value.
>
> These dspri value gets programmed per partition, that can be used to
> override QoS value coming from upstream (CPU).
>
> RFC patch-set[2] is based on James Morse's MPAM snapshot[3] for 6.2,
> and ACPI table is based on DEN0065A_MPAM_ACPI_2.0.
>

There are some aspects of this that I think we should be cautious about. First, there may inevitably be more properties in the future that need to be associated with a resource allocation, these may indeed be different between architectures and individual platforms. Second, user space need a way to know which properties are supported and what valid parameters may be.

On a high level I thus understand the goal be to add support for assigning a property to a resource allocation with "Priority partitioning control" being the first property.

To that end, I have a few questions:
* How can this interface be expanded to support more properties with the
expectation that a system/architecture may not support all resctrl supported
properties?
[>>] All these new controls ("Priority partitioning is one of them) detected as resource capabilities (via Features Identification Register), and these control will not be probed, if system/architecture
doesn't support it. From resource control side, this means that users will never get to know about the controls from schemata file. For instance, the platform that supports Priority partitioning control
schemata file looks like:

# cat schemata
L3:1=ffff

As oppose to when system has Priority partitioning control
# cat schemata
L3:1=ffff,f


* Is it possible for support for properties to vary between, for example, different
MSCs in the system? From resctrl side it may mean that there would be a resource,
for example "L3", with multiple instances, for example, cache with id #0, cache
with id#1, etc. but the supported properties or valid values of properties
may vary between the instances?
[>>] This is really implementation dependent but we would expect, if multiple L3 instances
across multiple dies implements this control, it should be uniform across, but let's take a case
where L3 MSC instance on one socket has this control, and other L3 MSC instance on another
socket doesn't have this control. From resctrl perspective, one would see this control
only for L3 instance that has this control, and programmed only for that L3 instance.

L3:0=XXXX,X;L3:1=XXXX

And as per proposed format:

L3:0=XXXX,PPART=X, L3:1=XXXX

* How can user space know that a system supports "Priority partitioning control"?
User space needs to know when/if it can attempt to write a priority to the
schemata.
[>>] At the moment, we label only the resource class, and would like to propose we should
label newly added controls (under a resource class) as well so that user can easily identify
which control to program. For instance, the schemata file with this proposed changes
will look like this:

L3:0=XXXX,PPART=X

where PPART=Priority partitioning control, Similarly, if L3 resource class has one more capability, say cache capacity partitioning.

L3:0=XXXX,PPART=X,CCAP=X

Very first control always be CAT/CPOR (with no labels)



* How can user space know what priority values are valid for a particular system?
[>>] Supported priority values are read from one of the MPAM Priority Partitioning register, and in the
Schemata file, it is set to Maximum value just like Cache portion bitmaps or Memory bandwidth allocation.
For instance:

L3:0=ffff,f, max priority values is f, and user can program/set from 0-15


> Test set-up and results:
> ------------------------
>
> The downstream priority value feeds into DRAM controller, and one of
> the important thing that it does with this value is to service the
> requests sooner (based on the traffic class), hence reducing latency without affecting performance.

Could you please elaborate here? I expected reduced latency to have a big impact on performance.
[>>] To be clear, by performance, it meant Memory bandwidth, and with this specific configuration/test
We see priority partitioning as a utility to guarantee lower latency. We are yet to explore its affect
On memory bandwidth side.

>
> Within the DDR QoS traffic class.
>
> 0--5 ----> Low priority value
> 6-10 ----> Medium priority value
> 11-15 ----> High priority value
>
> Benchmark[4] used is multichase.
>
> Two partition P1 and P2:
>
> Partition P1:
> -------------
> Assigned core 0
> 100% BW assignment
>
> Partition P2:
> -------------
> Assigned cores 1-79
> 100% BW assignment
>
> Test Script:
> -----------
> mkdir p1
> cd p1
> echo 1 > cpus
> echo L3:1=8000,5 > schemata ##### DSPRI set as 5 (lpr)
> echo "MB:0=100" > schemata
>
> mkdir p2
> cd p2
> echo ffff,ffffffff,fffffffe > cpus
> echo L3:1=8000,0 > schemata
> echo "MB:0=100" > schemata
>
> ### Loaded latency run, core 0 does chaseload (pointer chase) with low
> priority value 5, and cores 1-79 does memory bandwidth run ###

Could you please elaborate what is meant with a "memory bandwidth run"?
[>>] By memory bandwidth run, it meant memory bandwidth test that measure data transfer rate between CPU cores , and Main memory (The 1G size we choose, make sure that it hits DDR , and not constrained to Caches).

> ./multiload -v -n 10 -t 80 -m 1G -c chaseload
>
> cd /sys/fs/resctrl/p1
>
> echo L3:1=8000,a > schemata ##### DSPRI set as 0xa (vpr)
>
> ### Loaded latency run, core 0 does chaseload (pointer chase) with
> medium priority value a, and cores 1-79 does memory bandwidth run ###
> ./multiload -v -n 10 -t 80 -m 1G -c chaseload
>
> cd /sys/fs/resctrl/p1
>
> echo L3:1=8000,f > schemata ##### DSPRI set as 0xf (hpr)
>
> ### Loaded latency run where core 0 does chaseload (pointer chase)
> with high priority value f, and cores 1-79 does memory bandwidth run
> ### ./multiload -v -n 10 -t 80 -m 1G -c chaseload
>
> Results[5]:
>
> LPR average latency is 204.862(ns) vs VPR average latency is 161.018(ns) vs HPR average latency is 134.210(ns).

Reinette