Re: [Patch v4 17/22] sched/cache: Avoid cache-aware scheduling for memory-heavy processes

From: Peter Zijlstra

Date: Fri Apr 10 2026 - 05:21:39 EST

On Fri, Apr 10, 2026 at 04:59:19PM +0800, Chen, Yu C wrote:

> > This is pretty terrible. If you want LLC size, add it to the topology
> > information (and ideally integrate with RDT) and make proportional to
> > cpumask size, such that if someone cuts the domain in pieces, they get
> > proportional size etc.
> >
>
> If I understand correctly, do you mean the following:
>
> 1.Introduce a generic arch_get_llc_size() as a wrapper
> around the existing get_cpu_cacheinfo_level(), which
> returns the llc_size. Both the scheduler and RDT can
> use arch_get_llc_size().

The tie in with RDT was more to affect the return of
arch_get_llc_size(). Eg. when RDT takes away some ways for specific
tasks, then the total effective size gets reduced for generic use.

> 2. The sched domain stores llc_size in
> sd->res_size = llc_size * sd_span / arch_llc_span,
> and the cache_aware_scheduler uses sd->res_size for
> the comparison.

Just so.

> We will adjust the code accordingly.

Thanks.

> > Also, if we have NUMA_BALANCING on, that can provide a much better
> > estimate for the actual size.
> >
> > Just using RSS seems like a very bad metric here.
> >
>
> Got it. Currently we lack accurate memory footprint metrics in
> the kernel. If we support user-provided hints in the future, we
> can leverage RDT llc_occupancy metrics(Is it legal to use
> RDT's metrics directly in the kernel? It would switch from
> MSR-read to MMIO read thus less overhead). For now, let me have
> a try how to leverage NUMA fault-in stats. If NUMA balancing
> is off, I need to think more on how to avoid over-aggregation for
> memory-intensive workloads.

There is also things like this:

https://lkml.kernel.org/r/20260323095104.238982-1-bharata@xxxxxxx

But yeah, in an ideal world we could be looking at LLC cache hit/miss
information... streaming workloads would have very low hit rate.

But yes, possible prctl() controls could help, create tools to disable
things per program etc.