RE: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

From: Yu, Fenghua
Date: Wed Jan 18 2017 - 16:17:09 EST


> From: Thomas Gleixner [mailto:tglx@xxxxxxxxxxxxx]
> On Tue, 17 Jan 2017, Shivappa Vikas wrote:
> > On Tue, 17 Jan 2017, Thomas Gleixner wrote:
> > > On Fri, 6 Jan 2017, Vikas Shivappa wrote:
> > > > - Issue(1): Inaccurate data for per package data, systemwide. Just
> > > > prints zeros or arbitrary numbers.
> > > >
> > > > Fix: Patches fix this by just throwing an error if the mode is not
> > > > supported.
> > > > The modes supported is task monitoring and cgroup monitoring.
> > > > Also the per package
> > > > data for say socket x is returned with the -C <cpu on socketx> -G
> > > > cgrpy option.
> > > > The systemwide data can be looked up by monitoring root cgroup.
> > >
> > > Fine. That just lacks any comment in the implementation. Otherwise I
> > > would not have asked the question about cpu monitoring. Though I
> > > fundamentaly hate the idea of requiring cgroups for this to work.
> > >
> > > If I just want to look at CPU X why on earth do I have to set up all
> > > that cgroup muck? Just because your main focus is cgroups?
> >
> > The upstream per cpu data is broken because its not overriding the
> > other task event RMIDs on that cpu with the cpu event RMID.
> >
> > Can be fixed by adding a percpu struct to hold the RMID thats
> > affinitized to the cpu, however then we miss all the task
> > llc_occupancy in that - still evaluating it.
>
> The point here is that CQM is closely connected to the cache allocation
> technology. After a lengthy discussion we ended up having
>
> - per cpu CLOSID
> - per task CLOSID
>
> where all tasks which do not have a CLOSID assigned use the CLOSID which is
> assigned to the CPU they are running on.
>
> So if I configure a system by simply partitioning the cache per cpu, which is
> the proper way to do it for HPC and RT usecases where workloads are
> partitioned on CPUs as well, then I really want to have an equaly simple way
> to monitor the occupancy for that reservation.
>
> And looking at that from the CAT point of view, which is the proper way to do
> it, makes it obvious that CQM should be modeled to match CAT.
>
> So lets assume the following:
>
> CPU 0-3 default CLOSID 0
> CPU 4 CLOSID 1
> CPU 5 CLOSID 2
> CPU 6 CLOSID 3
> CPU 7 CLOSID 3
>
> T1 CLOSID 4
> T2 CLOSID 5
> T3 CLOSID 6
> T4 CLOSID 6
>
> All other tasks use the per cpu defaults, i.e. the CLOSID of the CPU
> they run on.
>
> then the obvious basic monitoring requirement is to have a RMID for each
> CLOSID.

So the mapping between RMID and CLOSID is 1:1 mapping, right?

Then changing the current resctrl interface in kernel as follows:

1. In rdtgroup_mkdir() (i.e. creating a partition in resctrl), allocate one RMID for the partition. Then the mapping between RMID and CLOSID is set up in mkdir.
2. In rdtgroup_rmdir() (i.e. removing a partition in resctrl), free the RMID. Then the mapping between RMID and CLOSID is dismissed.

In user space:
1. Create a partition in resctrl and allocate L3 CBM in schemata and assign a PID in "tasks".
2. Start a user monitoring tool (e.g. perf) to monitor the PID. The monitoring tool needs to be updated to know resctrl interface. We may update perf to work with resctrl interface.

Since the PID is assigned to the partition which has a CLOSID and RMID mapping, the PID is monitored while it's running in the allocated portion of L3.

Is above proposal the right way to go?

Thanks.

-Fenghua