Re: [PATCH 1/1] x86/cqm: Cqm requirements
From: Thomas Gleixner
Date: Mon Mar 13 2017 - 15:10:52 EST
On Fri, 10 Mar 2017, David Carrillo-Cisneros wrote:
> > Fine. So we need this for ONE particular use case. And if that is not well
> > documented including the underlying mechanics to analyze the data then this
> > will be a nice source of confusion for Joe User.
> >
> > I still think that this can be done differently while keeping the overhead
> > small.
> >
> > You look at this from the existing perf mechanics which require high
> > overhead context switching machinery. But that's just wrong because that's
> > not how the cache and bandwidth monitoring works.
> >
> > Contrary to the other perf counters, CQM and MBM are based on a context
> > selectable set of counters which do not require readout and reconfiguration
> > when the switch happens.
> >
> > Especially with CAT in play, the context switch overhead is there already
> > when CAT partitions need to be switched. So switching the RMID at the same
> > time is basically free, if we are smart enough to do an equivalent to the
> > CLOSID context switch mechanics and ideally combine both into a single MSR
> > write.
> >
> > With that the low overhead periodic sampling can read N counters which are
> > related to the monitored set and provide N separate results. For bandwidth
> > the aggregation is a simple ADD and for cache residency it's pointless.
> >
> > Just because perf was designed with the regular performance counters in
> > mind (way before that CQM/MBM stuff came around) does not mean that we
> > cannot change/extend that if it makes sense.
> >
> > And looking at the way Cache/Bandwidth allocation and monitoring works, it
> > makes a lot of sense. Definitely more than shoving it into the current mode
> > of operandi with duct tape just because we can.
> >
>
> You made a point. The use case I described can be better served with
> the low overhead monitoring groups that Fenghua is working on. Then
> that info can be merged with the per-CPU profile collected for non-RDT
> events.
>
> I am ok removing the perf-like CPU filtering from the requirements.
So if I'm not missing something then ALL remaining requirements can be
solved with the RDT integrated monitoring mechanics, right?
Thanks,
tglx