[GIT PULL] Scheduler enhancements for v6.14

From: Ingo Molnar
Date: Mon Jan 20 2025 - 06:08:09 EST



Linus,

Please pull the latest sched/core Git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-core-2025-01-20

# HEAD: 7d9da040575b343085287686fa902a5b2d43c7ca psi: Fix race when task wakes up before psi_sched_switch() adjusts flags

Scheduler enhancements for v6.14:

- Fair scheduler (SCHED_FAIR) enhancements:

- Behavioral improvements:
- Untangle NEXT_BUDDY and pick_next_task() (Peter Zijlstra)

- Delayed-dequeue enhancements & fixes: (Vincent Guittot)

- Rename h_nr_running into h_nr_queued
- Add new cfs_rq.h_nr_runnable
- Use the new cfs_rq.h_nr_runnable
- Removed unsued cfs_rq.h_nr_delayed
- Rename cfs_rq.idle_h_nr_running into h_nr_idle
- Remove unused cfs_rq.idle_nr_running
- Rename cfs_rq.nr_running into nr_queued
- Do not try to migrate delayed dequeue task
- Fix variable declaration position
- Encapsulate set custom slice in a __setparam_fair() function

- Fixes:
- Fix race between yield_to() and try_to_wake_up() (Tianchen Ding)
- Fix CPU bandwidth limit bypass during CPU hotplug (Vishal Chourasia)

- Cleanups:
- Clean up in migrate_degrades_locality() to improve
readability (Peter Zijlstra)
- Mark m*_vruntime() with __maybe_unused (Andy Shevchenko)
- Update comments after sched_tick() rename (Sebastian Andrzej Siewior)
- Remove CONFIG_CFS_BANDWIDTH=n definition of cfs_bandwidth_used()
(Valentin Schneider)

- Deadline scheduler (SCHED_DL) enhancements:

- Restore dl_server bandwidth on non-destructive root domain
changes (Juri Lelli)

- Correctly account for allocated bandwidth during
hotplug (Juri Lelli)

- Check bandwidth overflow earlier for hotplug (Juri Lelli)

- Clean up goto label in pick_earliest_pushable_dl_task()
(John Stultz)

- Consolidate timer cancellation (Wander Lairson Costa)

- Load-balancer enhancements:

- Improve performance by prioritizing migrating eligible
tasks in sched_balance_rq() (Hao Jia)

- Do not compute NUMA Balancing stats unnecessarily during
load-balancing (K Prateek Nayak)

- Do not compute overloaded status unnecessarily during
load-balancing (K Prateek Nayak)

- Generic scheduling code enhancements:

- Use READ_ONCE() in task_on_rq_queued(), to consistently use
the WRITE_ONCE() updated ->on_rq field (Harshit Agarwal)

- Isolated CPUs support enhancements: (Waiman Long)

- Make "isolcpus=nohz" equivalent to "nohz_full"
- Consolidate housekeeping cpumasks that are always identical
- Remove HK_TYPE_SCHED
- Unify HK_TYPE_{TIMER|TICK|MISC} to HK_TYPE_KERNEL_NOISE

- RSEQ enhancements:

- Validate read-only fields under DEBUG_RSEQ config
(Mathieu Desnoyers)

- PSI enhancements:

- Fix race when task wakes up before psi_sched_switch()
adjusts flags (Chengming Zhou)

- IRQ time accounting performance enhancements: (Yafang Shao)

- Define sched_clock_irqtime as static key
- Don't account irq time if sched_clock_irqtime is disabled

- Virtual machine scheduling enhancements:

- Don't try to catch up excess steal time (Suleiman Souhlal)

- Heterogenous x86 CPU scheduling enhancements: (K Prateek Nayak)

- Convert "sysctl_sched_itmt_enabled" to boolean
- Use guard() for itmt_update_mutex
- Move the "sched_itmt_enabled" sysctl to debugfs
- Remove x86_smt_flags and use cpu_smt_flags directly
- Use x86_sched_itmt_flags for PKG domain unconditionally

- Debugging code & instrumentation enhancements:

- Change need_resched warnings to pr_err() (David Rientjes)
- Print domain name in /proc/schedstat (K Prateek Nayak)
- Fix value reported by hot tasks pulled in /proc/schedstat (Peter Zijlstra)
- Report the different kinds of imbalances in /proc/schedstat (Swapnil Sapkal)
- Move sched domain name out of CONFIG_SCHED_DEBUG (Swapnil Sapkal)
- Update Schedstat version to 17 (Swapnil Sapkal)

Thanks,

Ingo

------------------>
Andy Shevchenko (1):
sched/fair: Mark m*_vruntime() with __maybe_unused

Chengming Zhou (1):
psi: Fix race when task wakes up before psi_sched_switch() adjusts flags

David Rientjes (1):
sched/debug: Change need_resched warnings to pr_err

Hao Jia (1):
sched/core: Prioritize migrating eligible tasks in sched_balance_rq()

Harshit Agarwal (1):
sched: add READ_ONCE to task_on_rq_queued

John Stultz (1):
sched: deadline: Cleanup goto label in pick_earliest_pushable_dl_task

Juri Lelli (3):
sched/deadline: Restore dl_server bandwidth on non-destructive root domain changes
sched/deadline: Correctly account for allocated bandwidth during hotplug
sched/deadline: Check bandwidth overflow earlier for hotplug

K Prateek Nayak (8):
sched/stats: Print domain name in /proc/schedstat
x86/itmt: Convert "sysctl_sched_itmt_enabled" to boolean
x86/itmt: Use guard() for itmt_update_mutex
x86/itmt: Move the "sched_itmt_enabled" sysctl to debugfs
x86/topology: Remove x86_smt_flags and use cpu_smt_flags directly
x86/topology: Use x86_sched_itmt_flags for PKG domain unconditionally
sched/fair: Do not compute NUMA Balancing stats unnecessarily during lb
sched/fair: Do not compute overloaded status unnecessarily during lb

Mathieu Desnoyers (1):
rseq: Validate read-only fields under DEBUG_RSEQ config

Peter Zijlstra (3):
sched/fair: Untangle NEXT_BUDDY and pick_next_task()
sched/fair: Fix value reported by hot tasks pulled in /proc/schedstat
sched/fair: Cleanup in migrate_degrades_locality() to improve readability

Sebastian Andrzej Siewior (1):
sched/fair: Update comments after sched_tick() rename.

Suleiman Souhlal (1):
sched: Don't try to catch up excess steal time.

Swapnil Sapkal (3):
sched: Report the different kinds of imbalances in /proc/schedstat
sched: Move sched domain name out of CONFIG_SCHED_DEBUG
docs: Update Schedstat version to 17

Tianchen Ding (1):
sched: Fix race between yield_to() and try_to_wake_up()

Valentin Schneider (1):
sched/fair: Remove CONFIG_CFS_BANDWIDTH=n definition of cfs_bandwidth_used()

Vincent Guittot (10):
sched/fair: Rename h_nr_running into h_nr_queued
sched/fair: Add new cfs_rq.h_nr_runnable
sched/fair: Use the new cfs_rq.h_nr_runnable
sched/fair: Removed unsued cfs_rq.h_nr_delayed
sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle
sched/fair: Remove unused cfs_rq.idle_nr_running
sched/fair: Rename cfs_rq.nr_running into nr_queued
sched/fair: Do not try to migrate delayed dequeue task
sched/fair: Fix variable declaration position
sched/fair: Encapsulate set custom slice in a __setparam_fair() function

Vishal Chourasia (1):
sched/fair: Fix CPU bandwidth limit bypass during CPU hotplug

Waiman Long (4):
sched/core: Remove HK_TYPE_SCHED
sched/isolation: Make "isolcpus=nohz" equivalent to "nohz_full"
sched/isolation: Consolidate housekeeping cpumasks that are always identical
sched: Unify HK_TYPE_{TIMER|TICK|MISC} to HK_TYPE_KERNEL_NOISE

Wander Lairson Costa (1):
sched/deadline: Consolidate Timer Cancellation

Yafang Shao (3):
sched: Define sched_clock_irqtime as static key
sched: Don't account irq time if sched_clock_irqtime is disabled
sched, psi: Don't account irq time if sched_clock_irqtime is disabled


Documentation/admin-guide/kernel-parameters.txt | 4 +-
Documentation/scheduler/sched-stats.rst | 126 ++++---
arch/x86/include/asm/topology.h | 4 +-
arch/x86/kernel/itmt.c | 81 ++---
arch/x86/kernel/smpboot.c | 19 +-
include/linux/sched.h | 10 +
include/linux/sched/isolation.h | 21 +-
include/linux/sched/topology.h | 13 +-
kernel/rseq.c | 98 ++++++
kernel/sched/core.c | 94 +++--
kernel/sched/cputime.c | 16 +-
kernel/sched/deadline.c | 119 +++++--
kernel/sched/debug.c | 25 +-
kernel/sched/fair.c | 444 ++++++++++++++----------
kernel/sched/features.h | 9 +
kernel/sched/isolation.c | 22 +-
kernel/sched/pelt.c | 4 +-
kernel/sched/psi.c | 7 +-
kernel/sched/sched.h | 37 +-
kernel/sched/stats.c | 11 +-
kernel/sched/stats.h | 4 +
kernel/sched/syscalls.c | 18 +-
kernel/sched/topology.c | 12 +-
23 files changed, 720 insertions(+), 478 deletions(-)