Re: [RFC PATCH mpam mpam/snapshot/v6.12-rc1 v2 0/6] arm_mpam: Introduce the Narrow-PARTID feature for MPAM driver

From: Zeng Heng
Date: Sun Dec 01 2024 - 21:13:07 EST




On 2024/11/25 23:39, Dave Martin wrote:

The advantages of doing this are:

  1. There is no need to modify or disrupt the existing resctrl layer

   interface, ensuring that each control group has same resource

   control functionality;

I don't think this is guaranteed.

If there is some MSC that does not have PARTID Narrowing support, and
this MSC has a memory bandwidth control that the MPAM driver exposes
through resctrl, then there is no way to configure that MSC that
exhibits the behaviour that the resctrl user expects.

For a concrete example:

Suppose that n=8, and the user asks for P1 to be given 30% of system
memory bandwidth.

On the affected MSC, P1 maps to eight PARTIDs, each with its own memory
bandwidth regulation.

If the work that happens to be in M1_1 dominates P1's bandwith
requirment, then PARTID_1_1 needs to be given 30% of total memory bandwidth.

If the work in P1 is evenly spread across M1_1, M1_2 ... M1_m, then
they would each need to be given 30% / 8 = 3.75% of total memory
bandwidth so that the total allocated bandwidth is 30%.

But we don't know how memory bandwidth consumption is distributed among
M1_1, M2_2 etc., so there is no way to program the memory bandiwdth
regulation on that MSC that guarantees the expected result of P1
receiving 30% of the total available bandwidth.


This means that on some hardware, a choice needs to be made: should the
MPAM driver hide from resctrl any controls that have this problem, or
should it disable the use of PARTID Narrowing for providing additional
monitoring groups.

My concern is that the correct choice is likely to be use-case-
dependent.

Do you have a view on this?

I understand your meaning and concerns, and this is indeed a problem.

From a software perspective, I think the use cases should be limited. For scenarios where mata does not support narrow-partid, I tend to favor
disabling the narrow-partid feature in the driver for such scenarios.

From a hardware perspective, MSCs, such as L2/L3, are designed with area
considerations in mind and choose to implement the narrow-partid feature.

MATA, on the other hand, is located on a different die and does not have similar concerns, often not considering the implementation of the narrow-partid feature, which makes this a rather thorny issue.

  2. MSCs that support narrow-partid (including intPARTID and reqPARTID)

   and MSCs that do not support (only including PARTID) can share the

   same PARTID space;

This seems like it may be problematic on some hardware, as I tried to
explain above for point 1.

Note though, if the non-Narrowing MSCs only have bitmap-type controls,
then sharing the PARTID space is harmless. This comes about because
because these controls explicitly allow contention: cache way 0 for
example is contended between all the work that is allowed by MPAM to
use this cache way. Breaking up the work arbitrarily under different
PARTIDs makes no difference in this case: the amount of work allocated
to that cache way, and the amount of contention is still the same.


Completely agree. MSCs without the narrow-partid feature, if they only have bitmap-type controls, can be compatible with the shared PARTID space scheme.


  3. On the premise of ensuring the (1) point, the number of control

   groups can be maximized, because users can always choose to make a

   control group act as a sub-monitoring group under another control

   group;

What do you mean by "control group" here?

resctrl's group hierarchy is strict: work is distributed across one or
more control groups at the top level, and the work in each control
group is further distributed across one or more monitoring groups
within that control group.

There is no way to repurpose a resctrl control group is a monitoring
group under some other control group.

Or were you referring to something else here?


Apologies for my miscommunication.

What I meant to say is to use the extra PARTIDs of MSC (which do not support
the narrow-partid feature) as an expansion for number of sub-monitoring groups.

2) The resctrl core code uses CLOSIDs and RMIDs to identify control
groups and monitoring groups. If a particular driver wants to
translate these into other values (reqPARTID, intPARTID, PMG) then it
can do so, but this mapping logic should be encapsulated in the driver.
This should be better for maintainability, since the details of the
remapping will be arch-specific -- and in general not all arches are
going to require it. With this in mind, I think that changes in the
resctrl core code would be minimal (perhaps no changes at all).

  Yes, maintaining the interface of the resctrl core code unchanged is,
in essence, the (first) important constraint of the current MPAM code.
We try the best to keep all resctrl interfaces and ensure the existing
functionality of x86 RDT.

  The only thing that falls short of being ideal (forgive me), is that
it introduces the sole new function resctrl_arch_alloc_rmid() into the
resctrl code (resctrl_arch_free_rmid() will be optimized away in the next
version, and there are no other new functions any more).

  The resctrl_arch_alloc_rmid() is the result of several restructuring
iterations and it is one of the most critical points in the patch series.

I was concerned about the changes in patch 6 for example, where the new
function task_belongs_to_ctrl_group() now has to look at more
information that just rdtgroup->closid, in order to determine which
control group a task belongs to. This is precisely what
resctrl_arch_match_closid() is supposed to do, using just the closid.

This suggests that the meaning of "closid" in the core code has been
changed: if closid is the control group identifier, then each control
group should have exactly one closid value.


For comparison, you may want to take a look at the top 3 commits of
this experimental branch:

https://git.gitlab.arm.com/linux-arm/linux-dm/-/commits/mpam/partid-pmg-remap/v0.2/head/?ref_type=heads

which attempts to do all the mapping within the MPAM driver instead.
Note, the approach is a bit over-complicated and I decided that a
simpler approach is needed. But it may help to illustrate what I mean
about keeping all the remapping out of the resctrl core code.



I understand your suggestion. I will consider refactoring the mapping relationships between closid/rmid and partid/reqpartid/intpartid/pmg.

In fact, I prepared a simplified version of v2 as v3. But in light of your suggestions, I decide to reconstruct the solution. At present, I'm not sure if I can completely isolated the mapping within the MPAM driver layer only. If my reconstructed version goes smoothly, I will reply ASAP.

4) If the mapping between reqPARTIDs and (CLOSID,RMID) pairs is static,
is it necessary to track which reqPARTIDs are in use? Would it be
simpler to treat all n reqPARTIDs as permanently assigned to the
corresponding CLOSID?

If reqPARTID usage is not tracked, then every control change on MSCs
that do not support PARTID Narrowing would need to be replicated across
all reqPARTIDs corresponding to the affected resctrl control partition.
But control changes are a relatively rare event, so this approach feels
acceptable as a way of keeping the driver complexity down. It partly
depends on how large the "n" parameter can become.
  Yes, totally agree. I will try to remove the reqPARTID bitmap and
the resctrl_arch_free_rmid(). As mentioned, this will simplify the code
logic and reduce changes to the resctrl layer code.

  Initially, to reduce the number of IPI interrupt, keep this resource
tracking until now, and I will prioritize optimization for the next
version.
 (In fact, the initial version of the patch set was dynamically allocated,
and during the code restructuring process, it was inevitable to retain
some of the original ideas.)

Best regards,
Zeng Heng


OK; fair enough.

This kind of feature could always be re-added later on if it proves to
be important for performance in real use-cases, but it is probably best
to keep things as simple as possible initially.


Many thanks as always for your prompt reply and insightful suggestions.

Best Regards,
Zeng Heng