Re: [PATCH 0/4] x86: Add Cache QoS Monitoring (CQM) support

From: Waskiewicz Jr, Peter P
Date: Tue Feb 18 2014 - 14:54:48 EST


On Tue, 2014-02-18 at 20:35 +0100, Peter Zijlstra wrote:
> On Tue, Feb 18, 2014 at 05:29:42PM +0000, Waskiewicz Jr, Peter P wrote:
> > > Its not a problem that changing the task:RMID map is expensive, what is
> > > a problem is that there's no deterministic fashion of doing it.
> >
> > We are going to add to the SDM that changing RMID's often/frequently is
> > not the intended use case for this feature, and can cause bogus data.
> > The real intent is to land threads into an RMID, and run that until the
> > threads are effectively done.
> >
> > That being said, reassigning a thread to a new RMID is certainly
> > supported, just "frequent" updates is not encouraged at all.
>
> You don't even need really high frequency, just unsynchronized wrt
> reading the counter. Suppose A flips the RMIDs about and just when its
> done programming B reads them.
>
> At that point you've got 0 guarantee the data makes any kind of sense.

Agreed, there is no guarantee with how the hardware is designed. We
don't have an instruction that can nuke RMID-tagged cachelines from the
cache, and the CPU guys (along with hpa) have been very explicit that
wbinv is not an option.

> > I do see that, however the userspace interface for this isn't ideal for
> > how the feature is intended to be used. I'm still planning to have this
> > be managed per process in /proc/<pid>, I just had other priorities push
> > this back a bit on my stovetop.
>
> So I really don't like anything /proc/$pid/ nor do I really see a point in
> doing that. What are you going to do in the /proc/$pid/ thing anyway?
> Exposing raw RMIDs is an absolute no-no, and anything else is going to
> end up being yet-another-grouping thing and thus not much different from
> cgroups.

Exactly. The cgroup grouping mechanisms fit really well with this
feature. I was exploring another way to do it given the pushback on
using cgroups initially. The RMID's won't be exposed, rather a group
identifier (in cgroups it's the new subdirectory in the subsystem), and
RMIDs are assigned by the kernel, completely hidden to userspace.

>
> > Also, now that the new SDM is available
>
> Can you guys please set up a mailing list already so we know when
> there's new versions out? Ideally mailing out the actual PDF too so I
> get the automagic download and archive for all versions.

I assume this has been requested before. As I'm typing this, I just
received the notification internally that the new SDM is now published.
I'll forward your request along and see what I hear back.

> > , there is a new feature added to
> > the same family as CQM, called Memory Bandwidth Monitoring (MBM). The
> > original cgroup approach would have allowed another subsystem be added
> > next to cacheqos; the perf-cgroup here is not easily expandable.
> > The /proc/<pid> approach can add MBM pretty easily alongside CQM.
>
> I'll have to go read up what you've done now, but if its also RMID based
> I don't see why the proposed scheme won't work.

Yes please do look at the cgroup patches. For the RMID allocation, we
could use your proposal to manage allocation/reclamation, and the
management interface to userspace will match the use cases I'm trying to
enable.

> > > The below is a rough draft, most if not all XXXs should be
> > > fixed/finished. But given I don't actually have hardware that supports
> > > this stuff (afaik) I couldn't be arsed.
> >
> > The hardware is not publicly available yet, but I know that Red Hat and
> > others have some of these platforms for testing.
>
> Yeah, not in my house therefore it doesn't exist :-)
>
> > I really appreciate the patch. There was a good amount of thought put
> > into this, and gave a good set of different viewpoints. I'll keep the
> > comments all here in one place, it'll be easier to discuss than
> > disjointed in the code.
> >
> > The rotation idea to reclaim RMID's no longer in use is interesting.
> > This differs from the original patch where the original patch would
> > reclaim the RMID when monitoring was disabled for that group of
> > processes.
> >
> > I can see a merged sort of approach, where if monitoring for a group of
> > processes is disabled, we can place that RMID onto a reclaim list. The
> > next time an RMID is requested (monitoring is enabled for a
> > process/group of processes), the reclaim list is searched for an RMID
> > that has 0 occupancy (i.e. not in use), or worst-case, find and assign
> > one with the lowest occupancy. I did discuss this with hpa offline and
> > this seemed reasonable.
> >
> > Thoughts?
>
> So you have to wait for one 'freed' RMID to become empty before
> 'allowing' reads of the other RMIDs, otherwise the visible value can be
> complete rubbish. Even for low frequency rotation, see the above
> scenario about asynchronous operations.
>
> This means you have to always have at least one free RMID.

Understood now, I was missing the asynchronous point you were trying to
make. I thought you wanted the free RMID to use that to always assign
so you know it's "empty," not to get around the twiddling that can
occur.

Let me know what you think about the cacheqos cgroup implementation I
sent, and if things don't look horrible, I can respin with your RMID
management scheme.

Thanks,
-PJ

--
PJ Waskiewicz Open Source Technology Center
peter.p.waskiewicz.jr@xxxxxxxxx Intel Corp.

Attachment: smime.p7s
Description: S/MIME cryptographic signature