Re: [RFD] CAT user space interface revisited

From: Marcelo Tosatti
Date: Wed Dec 23 2015 - 05:28:55 EST


On Tue, Dec 22, 2015 at 06:12:05PM +0000, Yu, Fenghua wrote:
> > From: Thomas Gleixner [mailto:tglx@xxxxxxxxxxxxx]
> > Sent: Wednesday, November 18, 2015 10:25 AM
> > Folks!
> >
> > After rereading the mail flood on CAT and staring into the SDM for a while, I
> > think we all should sit back and look at it from scratch again w/o our
> > preconceptions - I certainly had to put my own away.
> >
> > Let's look at the properties of CAT again:
> >
> > - It's a per socket facility
> >
> > - CAT slots can be associated to external hardware. This
> > association is per socket as well, so different sockets can have
> > different behaviour. I missed that detail when staring the first
> > time, thanks for the pointer!
> >
> > - The association ifself is per cpu. The COS selection happens on a
> > CPU while the set of masks which are selected via COS are shared
> > by all CPUs on a socket.
> >
> > There are restrictions which CAT imposes in terms of configurability:
> >
> > - The bits which select a cache partition need to be consecutive
> >
> > - The number of possible cache association masks is limited
> >
> > Let's look at the configurations (CDP omitted and size restricted)
> >
> > Default: 1 1 1 1 1 1 1 1
> > 1 1 1 1 1 1 1 1
> > 1 1 1 1 1 1 1 1
> > 1 1 1 1 1 1 1 1
> >
> > Shared: 1 1 1 1 1 1 1 1
> > 0 0 1 1 1 1 1 1
> > 0 0 0 0 1 1 1 1
> > 0 0 0 0 0 0 1 1
> >
> > Isolated: 1 1 1 1 0 0 0 0
> > 0 0 0 0 1 1 0 0
> > 0 0 0 0 0 0 1 0
> > 0 0 0 0 0 0 0 1
> >
> > Or any combination thereof. Surely some combinations will not make any
> > sense, but we really should not make any restrictions on the stupidity of a
> > sysadmin. The worst outcome might be L3 disabled for everything, so what?
> >
> > Now that gets even more convoluted if CDP comes into play and we really
> > need to look at CDP right now. We might end up with something which looks
> > like this:
> >
> > 1 1 1 1 0 0 0 0 Code
> > 1 1 1 1 0 0 0 0 Data
> > 0 0 0 0 0 0 1 0 Code
> > 0 0 0 0 1 1 0 0 Data
> > 0 0 0 0 0 0 0 1 Code
> > 0 0 0 0 1 1 0 0 Data
> > or
> > 0 0 0 0 0 0 0 1 Code
> > 0 0 0 0 1 1 0 0 Data
> > 0 0 0 0 0 0 0 1 Code
> > 0 0 0 0 0 1 1 0 Data
> >
> > Let's look at partitioning itself. We have two options:
> >
> > 1) Per task partitioning
> >
> > 2) Per CPU partitioning
> >
> > So far we only talked about #1, but I think that #2 has a value as well. Let me
> > give you a simple example.
> >
> > Assume that you have isolated a CPU and run your important task on it. You
> > give that task a slice of cache. Now that task needs kernel services which run
> > in kernel threads on that CPU. We really don't want to (and cannot) hunt
> > down random kernel threads (think cpu bound worker threads, softirq
> > threads ....) and give them another slice of cache. What we really want is:
> >
> > 1 1 1 1 0 0 0 0 <- Default cache
> > 0 0 0 0 1 1 1 0 <- Cache for important task
> > 0 0 0 0 0 0 0 1 <- Cache for CPU of important task
> >
> > It would even be sufficient for particular use cases to just associate a piece of
> > cache to a given CPU and do not bother with tasks at all.
> >
> > We really need to make this as configurable as possible from userspace
> > without imposing random restrictions to it. I played around with it on my new
> > intel toy and the restriction to 16 COS ids (that's 8 with CDP
> > enabled) makes it really useless if we force the ids to have the same meaning
> > on all sockets and restrict it to per task partitioning.
> >
> > Even if next generation systems will have more COS ids available, there are
> > not going to be enough to have a system wide consistent view unless we
> > have COS ids > nr_cpus.
> >
> > Aside of that I don't think that a system wide consistent view is useful at all.
> >
> > - If a task migrates between sockets, it's going to suffer anyway.
> > Real sensitive applications will simply pin tasks on a socket to
> > avoid that in the first place. If we make the whole thing
> > configurable enough then the sysadmin can set it up to support
> > even the nonsensical case of identical cache partitions on all
> > sockets and let tasks use the corresponding partitions when
> > migrating.
> >
> > - The number of cache slices is going to be limited no matter what,
> > so one still has to come up with a sensible partitioning scheme.
> >
> > - Even if we have enough cos ids the system wide view will not make
> > the configuration problem any simpler as it remains per socket.
> >
> > It's hard. Policies are hard by definition, but this one is harder than most
> > other policies due to the inherent limitations.
> >
> > So now to the interface part. Unfortunately we need to expose this very
> > close to the hardware implementation as there are really no abstractions
> > which allow us to express the various bitmap combinations. Any abstraction I
> > tried to come up with renders that thing completely useless.
> >
> > I was not able to identify any existing infrastructure where this really fits in. I
> > chose a directory/file based representation. We certainly could do the same
>
> Is this be /sys/devices/system/?
> Then create qos/cat directory. In the future, other directories may be created
> e.g. qos/mbm?
>
> Thanks.
>
> -Fenghua

Fenghua,

I suppose Thomas is talking about the socketmask only, as discussed in
the call with Intel.

Thomas, is that correct? (if you want a change in directory structure,
please explain the whys, because we don't need that change in directory
structure).



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/