Re: [REGRESSION][v6.17-rc1]sched/fair: Bump sd->max_newidle_lb_cost when newidle balance fails

From: Chris Mason

Date: Mon Oct 06 2025 - 17:24:37 EST


On 10/6/25 4:23 PM, Joseph Salisbury wrote:
> Hi Chris,
>
> During testing, we are seeing a ~6% performance regression with the
> upstream stable v6.12.43 kernel (And Oracle UEK
> 6.12.0-104.43.4.el9uek.x86_64 kernel) when running the Phoronix
> pts/apache benchmark with 100 concurrent requests [0].  The regression
> is seen with the following hardware:
>
> PROCESSOR: Intel Xeon Platinum 8167M Core Count: 8 Thread Count: 16
> Extensions: SSE 4.2 + AVX512CD + AVX2 + AVX + RDRAND + FSGSBASE Cache
> Size: 16 MB Microcode: 0x1 Core Family: Cascade Lake
>
> After performing a bisect, we found that the performance regression was
> introduced by the following commit:
>
> Stable v6.12.43: fc4289233e4b ("sched/fair: Bump sd->max_newidle_lb_cost
> when newidle balance fails")
> Mainline v6.17-rc1: 155213a2aed4 ("sched/fair: Bump
> sd->max_newidle_lb_cost when newidle balance fails")
>
> Reverting this commit causes the performance regression to not exist.
>
> I was hoping to get your feedback, since you are the patch author.  Do
> you think gathering any additional data will help diagnose this issue?

Hi everyone,

Peter, we've had a collection of regression reports based on this
change, so it sounds like we need to make it less aggressive, or maybe
we need to make the degrading of the cost number more aggressive?

Joe (and everyone else who has hit this), can I talk you into trying the
drgn from
https://lore.kernel.org/lkml/2fbf24bc-e895-40de-9ff6-5c18b74b4300@xxxxxxxx/

I'm curious if it degrades at all or just gets stuck up high.

-chris