Re: [PATCH] sched/pelt: Use rq_clock_task() for hw_pressure

From: Hongyan Xia
Date: Thu Jul 25 2024 - 10:19:05 EST


On 25/07/2024 15:00, Chen Yu wrote:
Hi Hongyan,

On 2024-07-25 at 14:16:30 +0100, Hongyan Xia wrote:
On 25/07/2024 12:42, Chen Yu wrote:
commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
removed the decay_shift for hw_pressure. While looking at a related
bug report, it is found that this commit uses the sched_clock_task()
in sched_tick() while replaces the sched_clock_task() with rq_clock_pelt()
in __update_blocked_others(). This could bring inconsistence. One possible
scenario I can think of is in ___update_load_sum():

u64 delta = now - sa->last_update_time

'now' could be calculated by rq_clock_pelt() from
__update_blocked_others(), and last_update_time was calculated by
rq_clock_task() previously from sched_tick(). Usually the former chases
after the latter, it cause a very large 'delta' and brings unexpected
behavior. Although this should not impact x86 platform in the bug report,
it should be fixed for other platforms.

I agree with this patch but I'm a bit confused here. May I know what you
mean by 'should not impact x86 platform in the bug report'? But it closes a
bug report on qemu x86_64, so it does have an impact?


It should not have any impact on x86_64. I added the bug link here because I checked
the code while looking at that report. But that report might be false positve,
or at least not caused by this logic introduced by this commit, because
CONFIG_SCHED_HW_PRESSURE was not even set in the kernel config[1]. Maybe I should
remove the 'reported-by' and 'closes' tags?

[1] https://download.01.org/0day-ci/archive/20240709/202407091527.bb0be229-lkp@xxxxxxxxx/config-6.9.0-rc1-00051-g97450eb90965


Yeah, it might be a good idea to remove the link to avoid confusion, like you said HW pressure is not compiled in.

Even if there is pressure support, before your patch the big 'delta' should only result in a HW pressure value that decays more than it should, and should not be able to block tasks like in that bug report, so it's very likely that it's unrelated.