[GIT PULL] Scheduler changes for v6.12

From: Ingo Molnar
Date: Thu Sep 19 2024 - 06:00:41 EST


Linus,

Please pull the latest sched/core Git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-core-2024-09-19

# HEAD: bc9057da1a220ff2cb6c8885fd5352558aceba2c sched/cpufreq: Use NSEC_PER_MSEC for deadline task

Merge conflict notes:

#
# The freshly merged timer tree changes cause direct conflicts in two files
# and semantic conflicts in three files due to API changes:
#
# Conflicts:
# fs/select.c
# kernel/time/hrtimer.c
#
# Semantic conflicts:
# fs/proc/base.c
# kernel/sched/syscalls.c
# kernel/sys.c
#
# To double check, see my tentative merge resolution in adf04642e625:
#
# git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/merge
#

In the v6.12 scheduler development cycle we had 63 commits from 18 contributors:

- Implement the SCHED_DEADLINE server infrastructure - Daniel Bristot de Oliveira's
last major contribution to the kernel:

"SCHED_DEADLINE servers can help fixing starvation issues of low priority
tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU
cycles. Today we have RT Throttling; DEADLINE servers should be able to
replace and improve that."

(Daniel Bristot de Oliveira, Peter Zijlstra, Joel Fernandes,
Youssef Esmat, Huang Shijie)

- Preparatory changes for sched_ext integration:

- Use set_next_task(.first) where required
- Fix up set_next_task() implementations
- Clean up DL server vs. core sched
- Split up put_prev_task_balance()
- Rework pick_next_task()
- Combine the last put_prev_task() and the first set_next_task()
- Rework dl_server
- Add put_prev_task(.next)

(Peter Zijlstra, with a fix by Tejun Heo)

- Complete the EEVDF transition and refine EEVDF scheduling:

- Implement delayed dequeue
- Allow shorter slices to wakeup-preempt
- Use sched_attr::sched_runtime to set request/slice suggestion
- Document the new feature flags
- Remove unused and duplicate-functionality fields
- Simplify & unify pick_next_task_fair()
- Misc debuggability enhancements

(Peter Zijlstra, with fixes/cleanups by Dietmar Eggemann,
Valentin Schneider and Chuyi Zhou)

- Initialize the vruntime of a new task when it is first enqueued,
resulting in significant decrease in latency of newly woken tasks.
(Zhang Qiao)

- Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
(K Prateek Nayak, Peter Zijlstra)

- Clean up and clarify the usage of Clean up usage of rt_task()
(Qais Yousef)

- Preempt SCHED_IDLE entities in strict cgroup hierarchies
(Tianchen Ding)

- Clarify the documentation of time units for deadline scheduler
parameters. (Christian Loehle)

- Remove the HZ_BW chicken-bit feature flag introduced a year ago,
the original change seems to be working fine.
(Phil Auld)

- Misc fixes and cleanups (Chen Yu, Dan Carpenter, Huang Shijie,
Peilin He, Qais Yousefm and Vincent Guittot)

Thanks,

Ingo

------------------>
Chen Yu (2):
sched/pelt: Use rq_clock_task() for hw_pressure
kthread: Fix task state in kthread worker if being frozen

Christian Loehle (4):
sched/deadline: Convert schedtool example to chrt
sched/deadline: Clarify nanoseconds in uapi
cpufreq/cppc: Use NSEC_PER_MSEC for deadline task
sched/cpufreq: Use NSEC_PER_MSEC for deadline task

Chuyi Zhou (1):
sched/fair: Remove cfs_rq::nr_spread_over and cfs_rq::exec_clock

Dan Carpenter (1):
sched/debug: Fix fair_server_period_max value

Daniel Bristot de Oliveira (3):
sched/deadline: Comment sched_dl_entity::dl_server variable
sched/deadline: Deferrable dl server
sched/fair: Fair server interface

Dietmar Eggemann (1):
kernel/sched: Fix util_est accounting for DELAY_DEQUEUE

Huang Shijie (2):
sched/deadline: Fix schedstats vs deadline servers
sched/debug: Fix the runnable tasks output

Joel Fernandes (Google) (3):
sched/core: Add clearing of ->dl_server in put_prev_task_balance()
sched/core: Fix priority checking for DL server picks
sched/core: Fix picking of tasks for core scheduling with DL server

Peilin He (1):
sched/core: Add WARN_ON_ONCE() to check overflow for migrate_disable()

Peter Zijlstra (36):
sched/fair: Add trivial fair server
sched/rt: Remove default bandwidth control
sched/fair: Cleanup fair_server
sched/eevdf: Add feature comments
sched/eevdf: Remove min_vruntime_copy
sched/fair: Cleanup pick_task_fair() vs throttle
sched/fair: Cleanup pick_task_fair()'s curr
sched/fair: Unify pick_{,next_}_task_fair()
sched: Allow sched_class::dequeue_task() to fail
sched/fair: Re-organize dequeue_task_fair()
sched: Split DEQUEUE_SLEEP from deactivate_task()
sched: Prepare generic code for delayed dequeue
sched/uclamg: Handle delayed dequeue
sched/fair: Assert {set_next,put_prev}_entity() are properly balanced
sched/fair: Prepare exit/cleanup paths for delayed_dequeue
sched/fair: Prepare pick_next_task() for delayed dequeue
sched/fair: Implement ENQUEUE_DELAYED
sched,freezer: Mark TASK_FROZEN special
sched: Teach dequeue_task() about special task states
sched/fair: Implement delayed dequeue
sched/fair: Implement DELAY_ZERO
sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE
sched/fair: Avoid re-setting virtual deadline on 'migrations'
sched/eevdf: Allow shorter slices to wakeup-preempt
sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion
sched/eevdf: Propagate min_slice up the cgroup hierarchy
sched: Use set_next_task(.first) where required
sched: Fixup set_next_task() implementations
sched: Clean up DL server vs core sched
sched: Split up put_prev_task_balance()
sched: Rework pick_next_task()
sched: Combine the last put_prev_task() and the first set_next_task()
sched: Rework dl_server
sched: Add put_prev_task(.next)
sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
sched: Fix sched_delayed vs sched_core

Phil Auld (1):
sched: remove HZ_BW feature hedge

Qais Yousef (3):
sched/rt: Clean up usage of rt_task()
sched/rt, dl: Convert functions to return bool
sched/rt: Rename realtime_{prio, task}() to rt_or_dl_{prio, task}()

Tejun Heo (1):
sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running

Tianchen Ding (1):
sched/fair: Make SCHED_IDLE entity be preempted in strict hierarchy

Valentin Schneider (1):
sched/fair: Properly deactivate sched_delayed task upon class change

Vincent Guittot (1):
sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c

Youssef Esmat (1):
sched/core: Clear prev->dl_server in CFS pick fast path

Zhang Qiao (1):
sched: Initialize the vruntime of a new task when it is first enqueued


Documentation/scheduler/sched-deadline.rst | 14 +-
drivers/cpufreq/cppc_cpufreq.c | 6 +-
fs/bcachefs/six.c | 2 +-
fs/select.c | 2 +-
include/linux/ioprio.h | 2 +-
include/linux/sched.h | 28 +-
include/linux/sched/deadline.h | 14 +-
include/linux/sched/prio.h | 1 +
include/linux/sched/rt.h | 33 +-
include/uapi/linux/sched/types.h | 6 +-
kernel/freezer.c | 2 +-
kernel/kthread.c | 10 +-
kernel/locking/rtmutex.c | 4 +-
kernel/locking/rwsem.c | 4 +-
kernel/locking/ww_mutex.h | 2 +-
kernel/sched/core.c | 248 +++++++---
kernel/sched/cpufreq_schedutil.c | 6 +-
kernel/sched/deadline.c | 503 +++++++++++++++----
kernel/sched/debug.c | 198 +++++++-
kernel/sched/fair.c | 770 ++++++++++++++++++++++-------
kernel/sched/features.h | 30 +-
kernel/sched/idle.c | 23 +-
kernel/sched/rt.c | 261 +++++-----
kernel/sched/sched.h | 101 +++-
kernel/sched/stop_task.c | 18 +-
kernel/sched/syscalls.c | 132 +----
kernel/sched/topology.c | 8 +
kernel/time/hrtimer.c | 6 +-
kernel/trace/trace_sched_wakeup.c | 2 +-
mm/page-writeback.c | 4 +-
mm/page_alloc.c | 2 +-
31 files changed, 1695 insertions(+), 747 deletions(-)