Re: [External] Re: Fwd: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160

From: Hao Jia
Date: Thu Sep 07 2023 - 12:30:07 EST




On 2023/9/5 Peter Zijlstra wrote:
On Thu, Aug 31, 2023 at 04:48:29PM +0800, Hao Jia wrote:

If I understand correctly, rq->clock_update_flags may be set to
RQCF_ACT_SKIP after __schedule() holds the rq lock, and sometimes the rq
lock may be released briefly in __schedule(), such as newidle_balance(). At
this time Other CPUs hold this rq lock, and then calling
rq_clock_start_loop_update() may trigger this warning.

This warning check might be wrong. We need to add assert_clock_updated() to
check that the rq clock has been updated before calling
rq_clock_start_loop_update().

Maybe some things can be like this?

Urgh, aside from it being white space mangled, I think this is entirely
going in the wrong direction.

Leaking ACT_SKIP is dodgy as heck.. it's entirely too late to think
clearly though, I'll have to try again tomorrow.

Hi Peter,

Do you think this fix method is correct? Or should we go back to the beginning and move update_rq_clock() from unthrottle_cfs_rq()?

Thanks,
Hao