Re: [RFC][PATCH] sched: Cache aware load-balancing
From: Peter Zijlstra
Date: Tue Mar 25 2025 - 14:49:42 EST
On Tue, Mar 25, 2025 at 11:19:52PM +0800, Chen, Yu C wrote:
>
> Hi Peter,
>
> Thanks for sending this out,
>
> On 3/25/2025 8:09 PM, Peter Zijlstra wrote:
> > Hi all,
> >
> > One of the many things on the eternal todo list has been finishing the
> > below hackery.
> >
> > It is an attempt at modelling cache affinity -- and while the patch
> > really only targets LLC, it could very well be extended to also apply to
> > clusters (L2). Specifically any case of multiple cache domains inside a
> > node.
> >
> > Anyway, I wrote this about a year ago, and I mentioned this at the
> > recent OSPM conf where Gautham and Prateek expressed interest in playing
> > with this code.
> >
> > So here goes, very rough and largely unproven code ahead :-)
> >
> > It applies to current tip/master, but I know it will fail the __percpu
> > validation that sits in -next, although that shouldn't be terribly hard
> > to fix up.
> >
> > As is, it only computes a CPU inside the LLC that has the highest recent
> > runtime, this CPU is then used in the wake-up path to steer towards this
> > LLC and in task_hot() to limit migrations away from it.
> >
> > More elaborate things could be done, notably there is an XXX in there
> > somewhere about finding the best LLC inside a NODE (interaction with
> > NUMA_BALANCING).
> >
>
> Besides the control provided by CONFIG_SCHED_CACHE, could we also introduce
> sched_feat(SCHED_CACHE) to manage this feature, facilitating dynamic
> adjustments? Similarly we can also introduce other sched_feats for load
> balancing and NUMA balancing for fine-grain control.
We can do all sorts, but the very first thing is determining if this is
worth it at all. Because if we can't make this work at all, all those
things are a waste of time.
This patch is not meant to be merged, it is meant for testing and
development. We need to first make it actually improve workloads. If it
then turns out it regresses workloads (likely, things always do), then
we can look at how to best do that.