Re: [PATCH v9 14/26] x86/resctrl: Introduce interface to display number of free counters

From: Reinette Chatre
Date: Mon Dec 02 2024 - 13:34:06 EST


Hi Babu and Peter,

On 11/29/24 9:06 AM, Moger, Babu wrote:
> Hi Peter, Reinette,
>
> On 11/29/2024 3:59 AM, Peter Newman wrote:
>> Hi Babu,
>>
>> On Thu, Nov 28, 2024 at 8:35 PM Moger, Babu <bmoger@xxxxxxx> wrote:
>>>
>>> Hi Peter,
>>>
>>> On 11/28/2024 5:10 AM, Peter Newman wrote:
>>>> Hi Babu, Reinette,
>>>>
>>>> On Wed, Nov 27, 2024 at 8:05 PM Reinette Chatre
>>>> <reinette.chatre@xxxxxxxxx> wrote:
>>>>>
>>>>> Hi Babu,
>>>>>
>>>>> On 11/27/24 6:57 AM, Moger, Babu wrote:
>>
>>>>>> 1. Each group needs to remember counter ids in each domain for each event.
>>>>>>      For example:
>>>>>>      Resctrl group mon1
>>>>>>       Total event
>>>>>>       dom 0 cntr_id 1,
>>>>>>       dom 1 cntr_id 10
>>>>>>       dom 2 cntr_id 11
>>>>>>
>>>>>>      Local event
>>>>>>       dom 0 cntr_id 2,
>>>>>>       dom 1 cntr_id 15
>>>>>>       dom 2 cntr_id 10
>>>>>
>>>>> Indeed. The challenge here is that domains may come and go so it cannot be a simple
>>>>> static array. As an alternative it can be an xarray indexed by the domain ID with
>>>>> pointers to a struct like below to contain the counters associated with the monitor
>>>>> group:
>>>>>           struct cntr_id {
>>>>>                   u32     mbm_total;
>>>>>                   u32     mbm_local;
>>>>>           }
>>>>>
>>>>>
>>>>> Thinking more about how this array needs to be managed made me wonder how the
>>>>> current implementation deals with domains that come and go. I do not think
>>>>> this is currently handled. For example, if a new domain comes online and
>>>>> monitoring groups had counters dynamically assigned, then these counters are
>>>>> not configured to the newly online domain.
>>>
>>> I am trying to understand the details of your approach here.
>>>>
>>>> In my prototype, I allocated a counter id-indexed array to each
>>>> monitoring domain structure for tracking the counter allocations,
>>>> because the hardware counters are all domain-scoped. That way the
>>>> tracking data goes away when the hardware does.
>>>>
>>>> I was focused on allowing all pending counter updates to a domain
>>>> resulting from a single mbm_assign_control write to be batched and
>>>> processed in a single IPI, so I structured the counter tracker
>>>> something like this:
>>>
>>> Not sure what you meant here. How are you batching two IPIs for two domains?
>>>
>>> #echo "//0=t;1=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>
>>> This is still a single write. Two IPIs are sent separately, one for each
>>> domain.
>>>
>>> Are you doing something different?
>>
>> I said "all pending counter updates to a domain", whereby I meant
>> targeting a single domain.
>>
>> Depending on the CPU of the caller, your example write requires 1 or 2 IPIs.
>>
>> What is important is that the following write also requires 1 or 2 IPIs:
>>
>> (assuming /sys/fs/resctrl/mon_groups/[g1-g31] exist, line breaks added
>> for readability)
>>
>> echo $'//0=t;1=t\n
>> /g1/0=t;1=t\n
>> /g2/0=t;1=t\n
>> /g3/0=t;1=t\n
>> /g4/0=t;1=t\n
>> /g5/0=t;1=t\n
>> /g6/0=t;1=t\n
>> /g7/0=t;1=t\n
>> /g8/0=t;1=t\n
>> /g9/0=t;1=t\n
>> /g10/0=t;1=t\n
>> /g11/0=t;1=t\n
>> /g12/0=t;1=t\n
>> /g13/0=t;1=t\n
>> /g14/0=t;1=t\n
>> /g15/0=t;1=t\n
>> /g16/0=t;1=t\n
>> /g17/0=t;1=t\n
>> /g18/0=t;1=t\n
>> /g19/0=t;1=t\n
>> /g20/0=t;1=t\n
>> /g21/0=t;1=t\n
>> /g22/0=t;1=t\n
>> /g23/0=t;1=t\n
>> /g24/0=t;1=t\n
>> /g25/0=t;1=t\n
>> /g26/0=t;1=t\n
>> /g27/0=t;1=t\n
>> /g28/0=t;1=t\n
>> /g29/0=t;1=t\n
>> /g30/0=t;1=t\n
>> /g31/0=t;1=t\n'
>>
>> My ultimate goal is for a thread bound to a particular domain to be
>> able to unassign and reassign the local domain's 32 counters in a
>> single write() with no IPIs at all. And when IPIs are required, then
>> no more than one per domain, regardless of the number of groups
>> updated.
>>
>
> Yes. I think I got the idea. Thanks.
>
>>
>>>
>>>>
>>>> struct resctrl_monitor_cfg {
>>>>       int closid;
>>>>       int rmid;
>>>>       int evtid;
>>>>       bool dirty;
>>>> };
>>>>
>>>> This mirrors the info needed in whatever register configures the
>>>> counter, plus a dirty flag to skip over the ones that don't need to be
>>>> updated.
>>>
>>> This is what my understanding of your implementation.
>>>
>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>> index d94abba1c716..9cebf065cc97 100644
>>> --- a/include/linux/resctrl.h
>>> +++ b/include/linux/resctrl.h
>>> @@ -94,6 +94,13 @@ struct rdt_ctrl_domain {
>>>           u32                             *mbps_val;
>>>    };
>>>
>>> +struct resctrl_monitor_cfg {
>>> +    int closid;
>>> +    int rmid;
>>> +    int evtid;
>>> +    bool dirty;
>>> +};
>>> +
>>>    /**
>>>     * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor
>>> resource
>>>     * @hdr:               common header for different domain types
>>> @@ -116,6 +123,7 @@ struct rdt_mon_domain {
>>>           struct delayed_work             cqm_limbo;
>>>           int                             mbm_work_cpu;
>>>           int                             cqm_work_cpu;
>>> +     /* Allocate num_mbm_cntrs entries in each domain */
>>> +       struct resctrl_monitor_cfg      *mon_cfg;
>>>    };
>>>
>>>
>>> When a user requests an assignment for total event to the default group
>>> for domain 0, you go search in rdt_mon_domain(dom 0) for empty mon_cfg
>>> entry.
>>>
>>> If there is an empty entry, then use that entry for assignment and
>>> update closid, rmid, evtid and dirty = 1. We can get all these
>>> information from default group here.
>>>
>>> Does this make sense?
>>
>> Yes, sounds correct.
>
> I will probably add cntr_id in resctrl_monitor_cfg structure and
> initialize during the allocation. And rename the field 'dirty' to
> 'active'(or something similar) to hold the assign state for that
> entry. That way we have all the information required for assignment
> at one place. We don't need to update the rdtgroup structure.
>
> Reinette, What do you think about this approach?

I think this approach is in the right direction. Thanks to Peter for
the guidance here.
I do not think that it is necessary to add cntr_id to resctrl_monitor_cfg
though, I think the cntr_id would be the index to the array instead?

It may also be worthwhile to consider using a pointer to the resource
group instead of storing closid and rmid directly. If used to indicate
initialization then an initialized pointer is easier to distinguish than
the closid/rmid that may have zero as valid values.

I expect evtid will be enum resctrl_event_id and that raises the question
of whether "0" can indeed be used as an "uninitialized" value since doing
so would change the meaning of the enum. It may indeed keep things
separated by maintaining evtid as an enum resctrl_event_id and note the
initialization differently ... either via a pointer to a resource group
or entirely separately as Babu indicates later.

>>>> For the benefit of displaying mbm_assign_control, I put a pointer back
>>>> to any counter array entry allocated in the mbm_state struct only
>>>> because it's an existing structure that exists for every rmid-domain
>>>> combination.
>>>
>>> Pointer in mbm_state may not be required here.
>>>
>>> We are going to loop over resctrl groups. We can search the
>>> rdt_mon_domain to see if specific closid, rmid, evtid is already
>>> assigned or not in that domain.
>>
>> No, not required I guess. High-performance CPUs can probably search a
>> 32-entry array very quickly.
>
> Ok.
>

This is not so clear to me. I am wondering about the scenario when a resource
group is removed and its counters need to be freed. Searching which counters
need to be freed would then require a search of the array within every domain,
of which I understand there can be many? Having a pointer from the mbm_state
may help here.

Reinette