RE: [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling
From: Hindman, Gavin
Date: Mon Jan 15 2018 - 11:23:36 EST
Thanks for the feedback, Thomas.
> -----Original Message-----
> From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Thomas Gleixner
> Sent: Sunday, January 14, 2018 2:54 PM
> To: Chatre, Reinette <reinette.chatre@xxxxxxxxx>
> Cc: Yu, Fenghua <fenghua.yu@xxxxxxxxx>; Luck, Tony
> <tony.luck@xxxxxxxxx>; vikas.shivappa@xxxxxxxxxxxxxxx; Hansen, Dave
> <dave.hansen@xxxxxxxxx>; mingo@xxxxxxxxxx; hpa@xxxxxxxxx;
> x86@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache
> Pseudo-Locking enabling
>
> On Fri, 17 Nov 2017, Reinette Chatre wrote:
>
> Sorry for the delay. You know why :)
>
> > On 11/17/2017 4:48 PM, Thomas Gleixner wrote:
> > > On Mon, 13 Nov 2017, Reinette Chatre wrote:
> > > Did you compare that against the good old cache coloring mechanism,
> > > e.g. palloc ?
> >
>
> > I understand where your question originates. I have not compared
> > against PALLOC for two reasons:
> >
> > 1) PALLOC is not upstream and while inquiring about the status of this
> > work (please see https://github.com/heechul/palloc/issues/4 for
> > details) we learned that one reason for this is that recent Intel
> > processors are not well supported.
>
> So if I understand Heechul correctly then recent CPUs cannot be supported
> easily due to changes in the memory controllers and the cache. I assume the
> latter is related to CAT.
>
> > 2) The most recent kernel supported by PALLOC is v4.4 and also
> > mentioned in the above link there is currently no plan to upstream
> > this work for a less divergent comparison of PALLOC and the more
> > recent RDT/CAT enabling on which Cache Pseudo-Locking is built.
>
> Well, that's not a really good excuse for not trying. You at Intel should be able
> to get to the parameters easy enough :)
>
We can run the comparison, but I'm not sure that I understand the intent - my understanding of Palloc is that it's intended to allow allocation of memory to specific physical memory banks. While that might result in reduced cache-misses since processes are more separated, it's not explicitly intended to reduce cache-misses, and Palloc's benefits would only hold as long as you have few enough processes to be able to dedicate/isolate memory accordingly. Am I misunderstanding the intent/usage of palloc?
> > >> The cache pseudo-locking approach relies on generation-specific
> > >> behavior of processors. It may provide benefits on certain
> > >> processor generations, but is not guaranteed to be supported in the
> future.
> > >
> > > Hmm, are you saying that the CAT mechanism might change radically in
> > > the future so that access to cached data in an allocated area which
> > > does not belong to the current executing context wont work anymore?
> >
No, I don't see any scenario in which devices that currently support pseudo-locking would stop working, but until support is architectural support in a current generation of a product line doesn't imply support in a future generation. Certainly we'll make every effort to carry support forward, and would adjust to any changes in CAT support, but we can't account for unforeseen future architectural changes that might block pseudo-locking use-cases on top of CAT.
> > Most devices that publicly support CAT in the Linux mainline can take
> > advantage of Cache Pseudo-Locking. However, Cache Pseudo-Locking is a
> > model-specific feature so there may be some variation in if, or to
> > what extent, current and future devices can support Cache
> > Pseudo-Locking. CAT remains architectural.
>
> Sure, but that does NOT answer my question at all.
>
> > >> It is not a guarantee that data will remain in the cache. It is not
> > >> a guarantee that data will remain in certain levels or certain
> > >> regions of the cache. Rather, cache pseudo-locking increases the
> > >> probability that data will remain in a certain level of the cache
> > >> via carefully configuring the CAT feature and carefully controlling
> > >> application behavior.
> > >
> > > Which kind of applications are you targeting with that?
> > >
> > > Are there real world use cases which actually can benefit from this
> > > and
> >
> > To ensure I answer your question I will consider two views. First, the
> >"carefully controlling application behavior" referred to above refers
> >to applications/OS/VMs running after the pseudo-locked regions have
> >been set up. These applications should take care to not do anything,
> >for example call wbinvd, that would affect the Cache Pseudo-Locked
> >regions. Second, what you are also asking about is the applications
> >using these Cache Pseudo-Locked regions. We do see a clear performance
> >benefit to applications using these pseudo-locked regions. Latency
> >sensitive applications could relocate their code as well as data to
> >pseudo-locked regions for improved performance.
>
> This is again a marketing pitch and not answering my question about real
> world use cases.
>
There are a number of real-world use-cases that are already making use of hacked-up ad-hoc versions of pseudo-locking - this corner case has been available in hardware for some time - and this patch-set is intended to bring it more into the mainstream and more supportable. Primary usages right now are industrial PLCs/automation and high-frequency trading/financial enterprise systems, but anything with relatively small repeating data structures should see benefit.
> > > what are those applications supposed to do once the feature breaks
> > > with future generations of processors?
> >
> > This feature is model specific with a few platforms supporting it at
> > this time. Only platforms known to support Cache Pseudo-Locking will
> > expose its resctrl interface.
>
> And you deliberately avoided to answer my question again.
>
Reinette's not trying to avoid the questions, we just don't necessarily have definitive answers at this time. Currently pseudo-locking requires manual setup on the part of the integrator, so there will not be any invisible breakage when trying to port software expecting pseudo-locking to new devices, and we'll certainly do everything we can to minimize user-space/configuration impact on migration if things change going forward, but these are unknowns. We are in a bit of chicken/egg where people aren't broadly using it because it's not architectural, and it's not architectural because people aren't broadly using it. We could publicly carry the patches out of mainline, but our intent for pushing the patches to mainline are to a) increase exposure/usage b) reduce divergence across people already using hacked versions, and c) ease the overhead in keep patches in sync with the larger CAT infrastructure as it evolves - we are clear on the potential support burden being incurred by submitting a non-architectural feature, and there's certainly no intent to dump a science-experiment into mainline.
> Thanks,
>
> tglx
Thanks,
Gavin