Re: [RFD] resctrl: reassigning a running container's CTRL_MON group

From: Reinette Chatre
Date: Fri Oct 07 2022 - 11:36:51 EST


+Tony

On 10/7/2022 3:39 AM, Peter Newman wrote:
> Hi Reinette, Fenghua,
>
> I'd like to talk about the tasks file interface in CTRL_MON and MON
> groups.
>
> For some background, we are using the memory-bandwidth monitoring and
> allocation features of resctrl to maintain QoS on external memory
> bandwidth for latency-sensitive containers to help enable batch
> containers to use up leftover CPU/memory resources on a machine. We
> also monitor the external memory bandwidth usage of all hosted
> containers to identify ones which are misusing their latency-sensitive
> CoS assignment and downgrade them to the batch CoS.
>
> The trouble is, container manager developers working with the tasks
> interface have complained that it's not usable for them because it takes
> many (or an unbounded number of) passes to move all tasks from a
> container over, as the list is always changing.
>
> Our solution for them is to remove the need for moving tasks between
> CTRL_MON groups. Because we are mainly using MB throttling to implement
> QoS, we only need two classes of service. Therefore we've modified
> resctrl to reuse existing CLOSIDs for CTRL_MON groups with identical
> configurations, allowing us to create a CTRL_MON group for every
> container. Instead of moving the tasks over, we only need to update
> their CTRL_MON group's schemata. Another benefit for us is that we do
> not need to also move all of the tasks over to a new monitoring group in
> the batch CTRL_MON group, and the usage counts remain intact.
>
> The CLOSID management rules would roughly be:
>
> 1. If an update would cause a CTRL_MON group's config to match that of
> an existing group, the CTRL_MON group's CLOSID should change to that
> of the existing group, where the definition of "match" is: all
> control values match in all domains for all resources, as well as
> the cpu masks matching.
>
> 2. If an update to a CTRL_MON group sharing a CLOSID with another group
> causes that group to no longer match any others, a new CLOSID must
> be allocated.
>
> 3. An update to a CTRL_MON group using a non-shared CLOSID which
> continues to not match any others follows the current resctrl
> behavior.
>
> Before I prepare any patches for review, I'm interested in any comments
> or suggestions on the use case and solution.
>
> Are there simpler strategies for reassigning a running container's tasks
> to a different CTRL_MON group that we should be considering first?
>
> Any concerns about the CLOSID-reusing behavior? The hope is existing
> users who aren't creating identically-configured CTRL_MON groups would
> be minimally impacted. Would it help if the proposed behavior were
> opt-in at mount-time?
>
> Thanks!
> -Peter