Re: [PATCH v9 14/26] x86/resctrl: Introduce interface to display number of free counters
From: Moger, Babu
Date: Mon Dec 02 2024 - 16:10:03 EST
Hi Reinette,
On 12/2/24 12:33, Reinette Chatre wrote:
> Hi Babu and Peter,
>
> On 11/29/24 9:06 AM, Moger, Babu wrote:
>> Hi Peter, Reinette,
>>
>> On 11/29/2024 3:59 AM, Peter Newman wrote:
>>> Hi Babu,
>>>
>>> On Thu, Nov 28, 2024 at 8:35 PM Moger, Babu <bmoger@xxxxxxx> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> On 11/28/2024 5:10 AM, Peter Newman wrote:
>>>>> Hi Babu, Reinette,
>>>>>
>>>>> On Wed, Nov 27, 2024 at 8:05 PM Reinette Chatre
>>>>> <reinette.chatre@xxxxxxxxx> wrote:
>>>>>>
>>>>>> Hi Babu,
>>>>>>
>>>>>> On 11/27/24 6:57 AM, Moger, Babu wrote:
>>>
>>>>>>> 1. Each group needs to remember counter ids in each domain for each event.
>>>>>>> For example:
>>>>>>> Resctrl group mon1
>>>>>>> Total event
>>>>>>> dom 0 cntr_id 1,
>>>>>>> dom 1 cntr_id 10
>>>>>>> dom 2 cntr_id 11
>>>>>>>
>>>>>>> Local event
>>>>>>> dom 0 cntr_id 2,
>>>>>>> dom 1 cntr_id 15
>>>>>>> dom 2 cntr_id 10
>>>>>>
>>>>>> Indeed. The challenge here is that domains may come and go so it cannot be a simple
>>>>>> static array. As an alternative it can be an xarray indexed by the domain ID with
>>>>>> pointers to a struct like below to contain the counters associated with the monitor
>>>>>> group:
>>>>>> struct cntr_id {
>>>>>> u32 mbm_total;
>>>>>> u32 mbm_local;
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Thinking more about how this array needs to be managed made me wonder how the
>>>>>> current implementation deals with domains that come and go. I do not think
>>>>>> this is currently handled. For example, if a new domain comes online and
>>>>>> monitoring groups had counters dynamically assigned, then these counters are
>>>>>> not configured to the newly online domain.
>>>>
>>>> I am trying to understand the details of your approach here.
>>>>>
>>>>> In my prototype, I allocated a counter id-indexed array to each
>>>>> monitoring domain structure for tracking the counter allocations,
>>>>> because the hardware counters are all domain-scoped. That way the
>>>>> tracking data goes away when the hardware does.
>>>>>
>>>>> I was focused on allowing all pending counter updates to a domain
>>>>> resulting from a single mbm_assign_control write to be batched and
>>>>> processed in a single IPI, so I structured the counter tracker
>>>>> something like this:
>>>>
>>>> Not sure what you meant here. How are you batching two IPIs for two domains?
>>>>
>>>> #echo "//0=t;1=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>
>>>> This is still a single write. Two IPIs are sent separately, one for each
>>>> domain.
>>>>
>>>> Are you doing something different?
>>>
>>> I said "all pending counter updates to a domain", whereby I meant
>>> targeting a single domain.
>>>
>>> Depending on the CPU of the caller, your example write requires 1 or 2 IPIs.
>>>
>>> What is important is that the following write also requires 1 or 2 IPIs:
>>>
>>> (assuming /sys/fs/resctrl/mon_groups/[g1-g31] exist, line breaks added
>>> for readability)
>>>
>>> echo $'//0=t;1=t\n
>>> /g1/0=t;1=t\n
>>> /g2/0=t;1=t\n
>>> /g3/0=t;1=t\n
>>> /g4/0=t;1=t\n
>>> /g5/0=t;1=t\n
>>> /g6/0=t;1=t\n
>>> /g7/0=t;1=t\n
>>> /g8/0=t;1=t\n
>>> /g9/0=t;1=t\n
>>> /g10/0=t;1=t\n
>>> /g11/0=t;1=t\n
>>> /g12/0=t;1=t\n
>>> /g13/0=t;1=t\n
>>> /g14/0=t;1=t\n
>>> /g15/0=t;1=t\n
>>> /g16/0=t;1=t\n
>>> /g17/0=t;1=t\n
>>> /g18/0=t;1=t\n
>>> /g19/0=t;1=t\n
>>> /g20/0=t;1=t\n
>>> /g21/0=t;1=t\n
>>> /g22/0=t;1=t\n
>>> /g23/0=t;1=t\n
>>> /g24/0=t;1=t\n
>>> /g25/0=t;1=t\n
>>> /g26/0=t;1=t\n
>>> /g27/0=t;1=t\n
>>> /g28/0=t;1=t\n
>>> /g29/0=t;1=t\n
>>> /g30/0=t;1=t\n
>>> /g31/0=t;1=t\n'
>>>
>>> My ultimate goal is for a thread bound to a particular domain to be
>>> able to unassign and reassign the local domain's 32 counters in a
>>> single write() with no IPIs at all. And when IPIs are required, then
>>> no more than one per domain, regardless of the number of groups
>>> updated.
>>>
>>
>> Yes. I think I got the idea. Thanks.
>>
>>>
>>>>
>>>>>
>>>>> struct resctrl_monitor_cfg {
>>>>> int closid;
>>>>> int rmid;
>>>>> int evtid;
>>>>> bool dirty;
>>>>> };
>>>>>
>>>>> This mirrors the info needed in whatever register configures the
>>>>> counter, plus a dirty flag to skip over the ones that don't need to be
>>>>> updated.
>>>>
>>>> This is what my understanding of your implementation.
>>>>
>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>>> index d94abba1c716..9cebf065cc97 100644
>>>> --- a/include/linux/resctrl.h
>>>> +++ b/include/linux/resctrl.h
>>>> @@ -94,6 +94,13 @@ struct rdt_ctrl_domain {
>>>> u32 *mbps_val;
>>>> };
>>>>
>>>> +struct resctrl_monitor_cfg {
>>>> + int closid;
>>>> + int rmid;
>>>> + int evtid;
>>>> + bool dirty;
>>>> +};
>>>> +
>>>> /**
>>>> * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor
>>>> resource
>>>> * @hdr: common header for different domain types
>>>> @@ -116,6 +123,7 @@ struct rdt_mon_domain {
>>>> struct delayed_work cqm_limbo;
>>>> int mbm_work_cpu;
>>>> int cqm_work_cpu;
>>>> + /* Allocate num_mbm_cntrs entries in each domain */
>>>> + struct resctrl_monitor_cfg *mon_cfg;
>>>> };
>>>>
>>>>
>>>> When a user requests an assignment for total event to the default group
>>>> for domain 0, you go search in rdt_mon_domain(dom 0) for empty mon_cfg
>>>> entry.
>>>>
>>>> If there is an empty entry, then use that entry for assignment and
>>>> update closid, rmid, evtid and dirty = 1. We can get all these
>>>> information from default group here.
>>>>
>>>> Does this make sense?
>>>
>>> Yes, sounds correct.
>>
>> I will probably add cntr_id in resctrl_monitor_cfg structure and
>> initialize during the allocation. And rename the field 'dirty' to
>> 'active'(or something similar) to hold the assign state for that
>> entry. That way we have all the information required for assignment
>> at one place. We don't need to update the rdtgroup structure.
>>
>> Reinette, What do you think about this approach?
>
> I think this approach is in the right direction. Thanks to Peter for
> the guidance here.
> I do not think that it is necessary to add cntr_id to resctrl_monitor_cfg
> though, I think the cntr_id would be the index to the array instead?
Yes. I think We can use the index as cntn_id. Will let you know otherwise.
>
> It may also be worthwhile to consider using a pointer to the resource
> group instead of storing closid and rmid directly. If used to indicate
> initialization then an initialized pointer is easier to distinguish than
> the closid/rmid that may have zero as valid values.
Sure. Sounds good.
>
> I expect evtid will be enum resctrl_event_id and that raises the question
> of whether "0" can indeed be used as an "uninitialized" value since doing
> so would change the meaning of the enum. It may indeed keep things
> separated by maintaining evtid as an enum resctrl_event_id and note the
> initialization differently ... either via a pointer to a resource group
> or entirely separately as Babu indicates later.
Sure. Will add evtid as enum resctrl_event_id and use the "state" to
indicate assign/unassign/dirty status.
>
>>>>> For the benefit of displaying mbm_assign_control, I put a pointer back
>>>>> to any counter array entry allocated in the mbm_state struct only
>>>>> because it's an existing structure that exists for every rmid-domain
>>>>> combination.
>>>>
>>>> Pointer in mbm_state may not be required here.
>>>>
>>>> We are going to loop over resctrl groups. We can search the
>>>> rdt_mon_domain to see if specific closid, rmid, evtid is already
>>>> assigned or not in that domain.
>>>
>>> No, not required I guess. High-performance CPUs can probably search a
>>> 32-entry array very quickly.
>>
>> Ok.
>>
>
> This is not so clear to me. I am wondering about the scenario when a resource
> group is removed and its counters need to be freed. Searching which counters
> need to be freed would then require a search of the array within every domain,
> of which I understand there can be many? Having a pointer from the mbm_state
> may help here.
Sure. Will add the allocated entry pointer in mbm_state.
>
> Reinette
>
>
--
Thanks
Babu Moger