Re: [RFD] resctrl: reassigning a running container's CTRL_MON group
From: James Morse
Date: Tue Oct 25 2022 - 11:56:32 EST
Hi Peter,
On 20/10/2022 11:39, Peter Newman wrote:
> On Wed, Oct 19, 2022 at 3:58 PM James Morse <james.morse@xxxxxxx> wrote:
>> This isn't how MPAM is designed to be used. You'll hit nasty corners.
>> The big one is the Cache Storage Utilisation counters.
>>
>> See 11.5.2 of the MPAM spec, "MSMON_CFG_CSU_CTL, MPAM Memory System Monitor Configure
>> Cache Storage Usage Monitor Control Register". Not setting the MATCH_PARTID bit has this
>> warning:
>> | If MATCH_PMG is 1 and MATCH_PARTID is 0, it is CONSTRAINED UNPREDICTABLE whether the
>> | monitor instance:
>> | • Measures the storage used with matching PMG and with any PARTID.
>> | • Measures no storage usage, that is, MSMON_CSU.VALUE is zero.
>> | • Measures the storage used with matching PMG and PARTID, that is, treats
>> | MATCH_PARTID as = 1
>>
>> 'constrained unpredictable' is arm's term for "portable software can't rely on this".
>> The folk that designed MPAM don't believe "monitors would only match on PMGs" makes any
>> sense. A PMG is not an RMID. A case in point is the system with only 1 PMG bit.
>>
>> I'm afraid this approach would preclude support for the llc_occupancy counter, and would
>> artificially reduce the number of control groups that can be created as each control group
>> needs an 'RMID'. On the machine with 1 PMG bit - you get 2 control groups, even though it
>> has many more PARTID.
>
> The first sentence of the Resource Monitoring chapter is also quite an
> obstacle to my challenging to the PARTID-PMG hierarchy:
>
> | Software environments may be labeled as belonging to a Performance
> | Monitoring Group (PMG) within a partition.
>
> It seems like the only real issue is that the user is responsible for
> figuring out how best to make use of the available resources. But I seem
> to recall that was the expectation with resctrl, so I should probably
> stop trying to argue for expecting MPAM configurations which resemble
> RDT.
>
>
>> On 17/10/2022 11:15, Peter Newman wrote:
>>> Provided that there are sufficient monitor
>>> instances, there would never be any need to reprogram a monitor's
>>> PMG.
>>
>> It sounds like this moves the problem to "make everything a monitor group because only
>> monitor groups can be batch moved".
>>
>> If the tasks file could be moved between control and monitor groups, causing resctrl to
>> relabel the tasks - would that solve more of the problem? (it eliminates the need to make
>> everything a monitor group)
>
> This was about preserving the RMID and memory bandwidth counts across a
> CLOSID change. If the user is forced to conserve CTRL_MON groups due to
> a limited number of CLOSIDs, keeping the various containers' tasks
> separate is also a concern.
Ah, of course.
> But if there's no need to conserve CTRL_MON groups, then there's no real
> issue.
Yup. I think part of this is exposing the information user-space needs to make the right
decision.
I don't think we should merge 'task group moving' and 'old monitors keep counting', they
each make sense independently.
>> The devil is in the detail, I'm not sure how it serialises with a fork()ing process, I'd
>> hope to do better than relying on the kernel walking the list of processes a lot quicker
>> than user-space can.
>
> I wasn't planning to do it any more optimally than the rmdir
> implementation today when looking for all tasks impacted by a
> CLOSID/RMID deletion.
Aha - that is the use of for_each_process_thread() which takes the read-lock, instead of
relying on RCU, so it should be safe for processes fork()ing and exit()ing.
Thanks,
James