Re: [PATCH v3 0/2] governor: Fix races and stale baseline on prev_cpu_nice

From: Rafael J. Wysocki

Date: Tue May 26 2026 - 13:09:54 EST

On Sun, May 24, 2026 at 7:57 AM Zhongqiu Han
<zhongqiu.han@xxxxxxxxxxxxxxxx> wrote:
>
> On 5/22/2026 11:14 PM, Rafael J. Wysocki wrote:
> > On Sun, Apr 19, 2026 at 3:27 PM Zhongqiu Han
> > <zhongqiu.han@xxxxxxxxxxxxxxxx> wrote:
> >>
> >> Patch 1 fixes two baseline-consistency issues in the DBS governor
> >> framework.
> >>
> >> gov_update_cpu_data() writes prev_cpu_idle and prev_cpu_nice while
> >> holding only attr_set->update_lock, whereas dbs_update() reads and
> >> writes the same fields while holding only policy_dbs->update_mutex.
> >> Because these are independent locks, the two paths are not mutually
> >> exclusive. The fix acquires policy_dbs->update_mutex inside
> >> gov_update_cpu_data() for each policy.
> >>
> >> Additionally, cpufreq_dbs_governor_start() reads io_is_busy before
> >> acquiring policy_dbs->update_mutex. A concurrent io_is_busy_store()
> >> can update io_is_busy and call gov_update_cpu_data(), which writes
> >> prev_cpu_idle with the new value under the mutex.
> >> cpufreq_dbs_governor_start() then overwrites prev_cpu_idle with the
> >> stale value. The fix reads io_is_busy inside the mutex.
> >>
> >> Patch 2 fixes a stale-baseline race on prev_cpu_nice when
> >> ignore_nice_load is enabled via sysfs. There is a race window
> >> between setting ignore_nice_load and gov_update_cpu_data() acquiring
> >> update_mutex: a concurrent dbs_update() can see the new ignore_nice_load
> >> value but read a stale prev_cpu_nice, producing an inflated idle_time
> >> for that sample.
> >>
> >> The fix unconditionally advances prev_cpu_nice on every dbs_update()
> >> call. With prev_cpu_nice always reflecting the most recent sample,
> >> enabling ignore_nice_load can never produce a stale-baseline spike.
> >> To keep prev_cpu_nice handling consistent, gov_update_cpu_data()
> >> resets prev_cpu_nice unconditionally alongside prev_cpu_idle, and
> >> cpufreq_dbs_governor_start() initializes prev_cpu_nice unconditionally.
> >>
> >> v3:
> >> - Update linux-next base
> >> - Fix two additional issues found during sashiko.dev review:
> >> - Patch 1: move io_busy read inside policy_dbs->update_mutex in
> >> cpufreq_dbs_governor_start() to close a TOCTOU window against
> >> concurrent io_is_busy_store().
> >> - Patch 2: restore unconditional prev_cpu_nice reset in
> >> gov_update_cpu_data() to keep both baselines consistent when
> >> io_is_busy changes with ignore_nice_load enabled; update comment.
> >> - Patch 2: keep gov_update_cpu_data() in ignore_nice_load_store() to
> >> prevent a stale-baseline spike in the governor-start window (when
> >> ignore_nice_load is set before the first dbs_update() call); only
> >> cpufreq_governor.c is modified. Update commit message and add one
> >> more fixes tag '326c86deaed54a ("[CPUFREQ] Remove unneeded locks").'
> >> - Update the commit messages.
> >> - Link to v2: https://lore.kernel.org/all/20260409111407.9775-1-zhongqiu.han@xxxxxxxxxxxxxxxx/
> >> - Sashiko v2 comments: https://sashiko.dev/#/patchset/20260409111407.9775-1-zhongqiu.han%40oss.qualcomm.com
> >
> > So sashiko.dev still has some comments on the first patch in v3:
> >
> > https://sashiko.dev/#/patchset/20260419132655.3800673-1-zhongqiu.han%40oss.qualcomm.com
> >
> > Can you please have a look at that and let me know what you think?
> >
>
>
> Hi Rafael,
> Thanks for pointing this out. I've looked at both comments:
>
> Comment 1 (io_is_busy transition window):
>
> The race is real but pre-existing. After Patch 1 the only remaining
> window is the gap between the io_is_busy write and gov_update_cpu_data()
> acquiring the lock:
>
> CPU0 (io_is_busy_store): CPU1 (dbs_update):
> io_is_busy = 1 mutex_lock()
> /* waiting for mutex */ io_busy = 1 /* sees new value */
> cur_idle = get_cpu_idle_time(io_busy=1)
> /* prev_cpu_idle still set with io_busy=0 */
> /* idle_time may be clamped to 0 */
> /* load inflated for this sample */
> mutex_unlock()
> mutex_lock()
> prev_cpu_idle = ../* reset */
> mutex_unlock()
>
> The mismatch between the two metrics causes a one-sample anomaly in
> either direction: 0→1 tends to inflate the load estimate (frequency
> pushed upward), while 1→0 tends to deflate it. The AI's "drives to
> minimum" conclusion is wrong for the 0→1 case.
>
> This window is inherent to the design. One possible fix is to add a per
> policy io_is_busy copy inside policy_dbs_info and update it under
> update_mutex in gov_update_cpu_data(), so dbs_update() always reads a
> value consistent with the current prev_cpu_idle baseline. The impact
> should be a single-sample frequency spike that self-corrects.
>
>
> Comment 2 (sampling_rate_store vs. gov_set_update_util):
>
> Also pre-existing. The race:
>
> cpufreq_dbs_governor_start(): sampling_rate_store():
> mutex_unlock() mutex_lock()
> gov->start() sample_delay_ns = 0
> gov_set_update_util() mutex_unlock()
> sample_delay_ns = rate <--->
>
> If needed, e possible approach would be to move both gov->start() and
> gov_set_update_util() inside the mutex would work.

Yeah.

So I've applied both patches in the series as 7.2 material.

Please feel free to send a follow-up patch making the change mentioned above.

Thanks!