Re: [RFC PATCH] sched/fair: dynamically scale the period of cache work

From: Tim Chen

Date: Wed Apr 15 2026 - 13:22:45 EST

On Mon, 2026-04-13 at 15:23 +0800, Jianyong Wu wrote:
> When a preferred LLC is selected and remains stable, task_cache_work does
> not need to run frequently. Because it scans all system CPUs for
> computation, high-frequency execution hurts performance. We thus reduce
> the scan rate in such cases.
>

Thanks for your patch proposal.

> On the other hand, if the preferred node becomes suboptimal, we should

You mean preferred LLC right? preferred node is from NUMA balancing.

> increase the scan frequency to quickly find a better placement. The scan
> period is therefore dynamically adjusted.
>
> Signed-off-by: Jianyong Wu <wujianyong@xxxxxxxx>
>
> ---
> Hi ChenYu, Tim, Gengkun,
>
> I have another approach to address this issue, based on the observation
> that the scan work can be canceled if the preferred node is stable.This
> patch merely demonstrates the idea, but still needs more testing to
> verify its functionality. I'm sending it out early to gather feedback and
> opinions.
>
>

<...>

> @@ -1822,9 +1835,35 @@ static void task_cache_work(struct callback_head *work)
> * 3. 2X is chosen based on test results, as it delivers
> * the optimal performance gain so far.
> */
> - mm->sc_stat.cpu = m_a_cpu;
> + if (m_a_occ > (2 * curr_m_a_occ))
> + mm->sc_stat.cpu = m_a_cpu;
> +
> + if (!mm->sc_stat.last_reset_tick)
> + mm->sc_stat.last_reset_tick = now;
> +
> + /* Change scan_period when preferred LLC changed */
> + if (((mm->sc_stat.cpu != -1) && (m_a_cpu != -1)
> + && (llc_id(mm->sc_stat.cpu) != llc_id(m_a_cpu)))
> + || need_scan) {
> + if (!need_scan)
> + need_scan = 1;
> +
> + WRITE_ONCE(mm->sc_stat.scan_period,
> + max(mm->sc_stat.scan_period >> 1, llc_scan_period_min));
> + WRITE_ONCE(mm->sc_stat.last_reset_tick, now);
> + }
> + }
> +
> + if ((now - READ_ONCE(mm->sc_stat.last_reset_tick) > llc_scan_period_threshold)
> + && !need_scan) {
> + WRITE_ONCE(mm->sc_stat.scan_period, min(mm->sc_stat.scan_period << 1,
> + llc_scan_period_max));

I think that llc_scan_period_max should be the same as llc_epoch_affinity_timeout.
We should not increase the scan period beyond that as that's the time scale
where we consider cache data relevant.

Tim