Re: [RFD] resctrl: reassigning a running container's CTRL_MON group

From: Reinette Chatre
Date: Wed Oct 12 2022 - 13:23:30 EST


Hi Peter,

On 10/12/2022 4:21 AM, Peter Newman wrote:
> [Adding Gaurang to CC]
>
> On Tue, Oct 11, 2022 at 1:35 AM Reinette Chatre
> <reinette.chatre@xxxxxxxxx> wrote:
>>
>> On 10/7/2022 10:28 AM, Tony Luck wrote:
>>> I don't know how complex it would for the kernel to implement this. Or
>>> whether it would meet Google's needs.
>>>
>>
>> How about moving monitor groups from one control group to another?
>>
>> Based on the initial description I got the impression that there is
>> already a monitor group for every container. (Please correct me if I am
>> wrong). If this is the case then it may be possible to create an interface
>> that could move an entire monitor group to another control group. This would
>> keep the benefit of usage counts remaining intact, tasks get a new closid, but
>> keep their rmid. There would be no need for the user to specify process-ids.
>
> Yes, Stephane also pointed out the importance of maintaining RMID assignments
> as well and I don't believe I put enough emphasis on it during my
> original email.
>
> We need to maintain accurate memory bandwidth usage counts on all
> containers, so it's important to be able to maintain an RMID assignment
> and its event counts across a CoS downgrade. The solutions Tony
> suggested do solve the races in moving the tasks, but the container
> would need to temporarily join the default MON group in the new CTRL_MON
> group before it can be moved to its replacement MON group.
>
> Being able to re-parent a MON group would allow us to change the CLOSID
> independently of the RMID in a container and would address the issue.

What if resctrl adds support to rdtgroup_kf_syscall_ops for
the .rename callback?

It seems like doing so could enable users to do something like:
mv /sys/fs/resctrl/groupA/mon_groups/containerA /sys/fs/resctrl/groupB/mon_groups/

Such a user request would trigger the "containerA" monitor group
to be moved to another control group. All tasks within it could be moved to
the new control group (their CLOSIDs are changed) while their RMIDs
remain intact.

I just read James's response and I do not know how this could be made to
work with the Arm monitoring when it arrives. Potentially there
could be an architecture specific "move monitor group" call.

> The only other point I can think of to differentiate it from the
> automatic CLOSID management solution is whether the 1:1 CTRL_MON:CLOSID
> approach will become too limiting going forward. For example, if there
> are configurations where one resource has far fewer CLOSIDs than others
> and we want to start assigning CLOSIDs on-demand, per-resource to avoid
> wasting other resources' available CLOSID spaces. If we can foresee this
> becoming a concern, then automatic CLOSID management would be
> inevitable.

I think Fenghua answered this well.

Reinette