Re: [PATCH v3 00/21] Cache Aware Scheduling

From: Tim Chen

Date: Fri Feb 20 2026 - 13:14:19 EST

On Fri, 2026-02-20 at 03:29 +0000, Qais Yousef wrote:
> On 02/19/26 10:11, Tim Chen wrote:
> > On Thu, 2026-02-19 at 23:07 +0800, Chen, Yu C wrote:
> > > Hi Peter, Qais,
> > >
> > > On 2/19/2026 10:41 PM, Peter Zijlstra wrote:
> > > > On Thu, Feb 19, 2026 at 02:08:28PM +0000, Qais Yousef wrote:
> > > > > On 02/10/26 14:18, Tim Chen wrote:
> > >
> > > [ ... ]
> > >
> > > > >
> > > > > I admit yet to look fully at the series. But I must ask, why are you deferring
> > > > > to load balance and not looking at wake up path? LB should be for corrections.
> > > > > When wake up path is doing wrong decision all the time, LB (which is super slow
> > > > > to react) is too late to start grouping tasks? What am I missing?
> > > >
> > > > There used to be wakeup steering, but I'm not sure that still exists in
> > > > this version (still need to read beyond the first few patches). It isn't
> > > > hard to add.
> > > >
> > >
> > > Please let me explain a little more about why we did this in the
> > > load balance path. Yes, the original version implemented cache-aware
> > > scheduling only in the wakeup path. According to our testing, this appeared
> > > to cause some task bouncing issues across LLCs. This was due to conflicts
> > > with the legacy load balancer, which tries to spread tasks to different
> > > LLCs.
> > > So as Peter said, the load balancer should be taken care of anyway. Later,
> > > we kept only the cache aware logic in the load balancer, and the test
> > > results
> > > became much more stable, so we kept it as is. The wakeup path more or less
> > > aggregates the wakees(threads within the same process) within the LLC in
> > > the
> > > wakeup fast path, so we have not changed it for now.
> > >
> > > Let me copy the changelog from the previous patch version:
> > >
> > > "
> > > In previous versions, aggregation of tasks were done in the
> > > wake up path, without making load balancing paths aware of
> > > LLC (Last-Level-Cache) preference. This led to the following
> > > problems:
> > >
> > > 1) Aggregation of tasks during wake up led to load imbalance
> > > between LLCs
> > > 2) Load balancing tried to even out the load between LLCs
> > > 3) Wake up tasks aggregation happened at a faster rate and
> > > load balancing moved tasks in opposite directions, leading
> > > to continuous and excessive task migrations and regressions
> > > in benchmarks like schbench.
> > >
> > > In this version, load balancing is made cache-aware. The main
> > > idea of cache-aware load balancing consists of two parts:
> > >
> > > 1) Identify tasks that prefer to run on their hottest LLC and
> > > move them there.
> > > 2) Prevent generic load balancing from moving a task out of
> > > its hottest LLC.
> > > "
> > >
> >
> > Another reason why we moved away from doing things in the wake up
> > path is load imbalance consideration. Wake up path does not have
> > the most up to date load information in the LLC sched domains as
> > in the load balance path. So you may actually have everyone rushed
>
> What's the reason wake up doesn't have the latest info? Is this a limitation of
> these large systems where stats updates are too expensive to do? Is it not
> fixable at all?

You will need to sum the load for each run queue for each LLC to get
an accurate picture. That will be too expensive on the wake up path.

Tim

>
> > into each's favorite LLC and causes LLC overload. And load balance
> > will have to undo this. This led to frequent task migrations that
> > hurts performance.
> >
> > It is better to consider LLC preference in the load balance path
> > so we can aggregate tasks while still keeping load imbalance under
> > control.
> >
> > Tim