RE: [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to read/write event configuration

From: Moger, Babu
Date: Tue Sep 20 2022 - 13:09:52 EST


[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> Sent: Monday, September 19, 2022 4:07 PM
> To: Moger, Babu <Babu.Moger@xxxxxxx>; corbet@xxxxxxx;
> tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; bp@xxxxxxxxx
> Cc: fenghua.yu@xxxxxxxxx; dave.hansen@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx;
> hpa@xxxxxxxxx; paulmck@xxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx;
> quic_neeraju@xxxxxxxxxxx; rdunlap@xxxxxxxxxxxxx;
> damien.lemoal@xxxxxxxxxxxxxxxxxx; songmuchun@xxxxxxxxxxxxx;
> peterz@xxxxxxxxxxxxx; jpoimboe@xxxxxxxxxx; pbonzini@xxxxxxxxxx;
> chang.seok.bae@xxxxxxxxx; pawan.kumar.gupta@xxxxxxxxxxxxxxx;
> jmattson@xxxxxxxxxx; daniel.sneddon@xxxxxxxxxxxxxxx; Das1, Sandipan
> <Sandipan.Das@xxxxxxx>; tony.luck@xxxxxxxxx; james.morse@xxxxxxx;
> linux-doc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> bagasdotme@xxxxxxxxx; eranian@xxxxxxxxxx
> Subject: Re: [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to
> read/write event configuration
>
> Hi Babu,
>
> On 9/19/2022 1:26 PM, Moger, Babu wrote:
> > On 9/19/22 11:42, Reinette Chatre wrote:
> >> On 9/19/2022 8:46 AM, Moger, Babu wrote:
> >>> On 9/16/22 10:58, Reinette Chatre wrote:
> >>>> On 9/7/2022 11:01 AM, Babu Moger wrote:
> >>>>> Add two new sysfs files to read/write the event configuration if
> >>>>> the feature Bandwidth Monitoring Event Configuration (BMEC) is
> >>>>> supported. The file mbm_local_config is for the configuration of
> >>>>> the event mbm_local_bytes and the file mbm_total_config is for the
> >>>>> configuration of mbm_total_bytes.
> >>>>>
> >>>>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local*
> >>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
> >>>>>
> >>>>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total*
> >>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> >>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
> >>>>>
> >>>> This patch makes the mbm*config files per monitor group. Looking
> >>>> ahead at later patches how the configuration is set it is not clear
> >>>> to me that this is the right place for these configuration files.
> >>>>
> >>>> Looking ahead to patch 10 there is neither rmid nor closid within
> >>>> the (MSR_IA32_EVT_CFG_BASE + index) register - it only takes the
> >>>> bits indicating what access types needs to be counted. Also in
> >>>> patch 10 I understand that the scope of this register is per L3
> >>>> cache domain.
> >>> Yes. Scope of  MSR_IA32_EVT_CFG_BASE per L3 domain.
> >>>> Considering this, why is the sysfs file associated with each
> >>>> monitor group?
> >>> Please see the response below.
> >>>> For example, consider the following scenario:
> >>>> # cd /sys/fs/resctrl
> >>>> # mkdir g2
> >>>> # mkdir mon_groups/m1
> >>>> # mkdir mon_groups/m2
> >>>> # find . | grep mbm_local_config
> >>>> ./mon_data/mon_L3_00/mbm_local_config
> >>>> ./mon_data/mon_L3_01/mbm_local_config
> >>>> ./g2/mon_data/mon_L3_00/mbm_local_config
> >>>> ./g2/mon_data/mon_L3_01/mbm_local_config
> >>>> ./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
> >>>> ./mon_groups/m2/mon_data/mon_L3_01/mbm_local_config
> >>>> ./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
> >>>> ./mon_groups/m1/mon_data/mon_L3_01/mbm_local_config
> >>>>
> >>>>
> >>>> From what I understand, the following sysfs files are associated
> >>>> with cache domain #0 and thus writing to any of these files would
> >>>> change the same configuration:
> >>>> ./mon_data/mon_L3_00/mbm_local_config
> >>>> ./g2/mon_data/mon_L3_00/mbm_local_config
> >>>> ./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
> >>>> ./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
> >>>>
> >>>> Could you please correct me where I am wrong?
> >>> For example, we have CPUs 0-7 in domain 0. We have two counters
> >>> which are configurable.
> >>>
> >>> Lets consider same example as your mentioned about.
> >>>
> >>> g2 is a control group.
> >>>
> >>> m1 and m2 are monitor group.
> >>>
> >>> We can have control group g2 with CPUs 0-7 to limit the L3 bandwidth
> >>> (or memory bandwidth with required schemata setting).
> >>>
> >>> We can have mon group m1 with cpus 0-3 to monitor mbm_local_bytes.
> >>>
> >>> We can have mon group m2 with cpus  4-7 to monitor mbm_total_bytes.
> >>>
> >>> Each group is independently, monitoring two separate thing. Without
> >>> having
> >> Right, because monitoring, the actual counting of the events, is per
> >> monitor group. When a monitor group is created a new RMID is created
> >> and when the counter is read it is per-RMID.
> >>
> >> The event configuration is independent from the RMID using the counter.
> >>
> >>> sysfs file (mbm_local_config and mbm_total_config) in each monitor
> >>> group, we wont be able to configure the above configuration.
> >> I do not understand this reasoning. From what I understand the event
> >> configuration is independent from the monitoring group. Thus,
> >> changing an event configuration for one monitoring group would impact
> >> all monitoring groups using that event counter. This implementation
> >> associates an event configuration with each monitoring group and by
> >> doing so it implies that it is unique to the monitoring group, but
> >> that is not how it works.
> >
> > The event configuration is designed per L3 domain. The mon_data is
> > also per domain (like mon_L3_00.. mon_L3_01 etc). So, added the event
> > configuration file inside each domain. We have all the information
> > inside the domain. Thought, that is right place. I am open for suggestions.
>
> It is not clear to me if you are also seeing all the duplication that accompanies
> this implementation. As you can see in the example I provided in
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kern
> el.org%2Flkml%2F13294a8f-e76f-a6a9-284c-
> 67adbc80ec7c%40intel.com%2F&amp;data=05%7C01%7Cbabu.moger%40amd.
> com%7Cc22190a25ac044ec5f5408da9a82f5b7%7C3dd8961fe4884e608e11a82
> d994e183d%7C0%7C0%7C637992184504699692%7CUnknown%7CTWFpbGZsb3
> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> D%7C3000%7C%7C%7C&amp;sdata=uPuGOlwaIgwJ9VnwNOS%2B4mUrqJnS65
> OdrEsEXtztUbU%3D&amp;reserved=0,
> if I understand the implementation correctly, there will be several
> configuration files scattered through resctrl that all configure the same value. I
> asked you to correct me where I am wrong but you did not correct me.
> Instead you keep repeating that placing the files in the duplicate locations is
> convenient. I can see how this is convenient for you but please do consider that
> having these duplicate configuration files scattered through resctrl makes for a
> very confusing user interface and unexpected behavior. Users would expect
> that a configuration associated with a monitor group impacts that monitor
> group only - not all monitor groups associated with that domain.
>
> User API is hard so this does need careful thought. Perhaps the architects can
> chime in here.
>
> One option could be:
> # cd /sys/fs/resctrl/info/L3_MON
> # cat mbm_total_config
> 0=7f;1=7f
> # cat mbm_local_config
> 0=15;1=15

I think this should work.
# cat mbm_total_config
0=7f;1=7f
I would think 0 and 1 are domain ids here.

We have to provide interface to write also.
#echo "0=0x70" > mbm_total_config (update mbm_total_config for domain 0)
#echo 1=0x10 > mbm_local_config (update mbm_local_config for domain 1)

We will have to parse the string and update the specific domains.

>
> It would be clear when changing mem_total_config or mbm_local_config that
> it would impact all monitoring groups within all resource groups. What do you
> think?

Yes. Thank you. It should work. As long and we have the ways to modify(and read) the specific L3 domains then it should be fine. Let me start on that. Will reply if I see any major issues.
Thanks
Babu