Re: [PATCH v2 07/23] sched/cache: Introduce per runqueue task LLC preference counter

From: Tim Chen
Date: Tue Dec 16 2025 - 17:46:12 EST


On Thu, 2025-12-11 at 11:31 +0100, Peter Zijlstra wrote:
> On Wed, Dec 10, 2025 at 10:49:14AM -0800, Tim Chen wrote:
> > On Wed, 2025-12-10 at 13:51 +0100, Peter Zijlstra wrote:
> > > On Wed, Dec 03, 2025 at 03:07:26PM -0800, Tim Chen wrote:
>
> > > Would it perhaps be easier to stick this thing in rq->sd rather than in
> > > rq->nr_pref_llc. That way it automagically switches with the 'new'
> > > domain. And then, with a bit of care, a singe load-balance pass should
> > > see a consistent view (there should not be reloads of rq->sd -- which
> > > will be a bit of an audit I suppose).
> >
> > We need nr_pref_llc information at the runqueue level because the load balancer 
> > must identify which specific rq has the largest number of tasks that 
> > prefer a given destination LLC. If we move the counter to the LLC’s sd 
> > level, we would only know the aggregate number of tasks in the entire LLC 
> > that prefer that destination—not which rq they reside on. Without per-rq 
> > counts, we would not be able to select the correct source rq to pull tasks from.
> >
> > The only way this could work at the LLC-sd level is if all CPUs within 
> > the LLC shared a single runqueue, which is not the case today.
> >
> > Let me know if I understand your comments correctly.
>
> So the sched_domain instances are per-cpu (hence the need for
> sched_domain_shared). So irrespective of what level you stick them at (I
> was thinking the bottom most, but it really doesn't matter) they will be
> per CPU.

One side effect of that is when rebuild_sched_domains() got triggered, all
rq->sd is getting reallocated. So we'll lose the old LLC preferences till
we have time to re-sample process occupancy. I think it is okay as long
as the call to rebuild_sched_domains() too frequently. Is this assumption
correct?

Tim