[PATCH 0/3] sched/core: Fix PSI inconsistent task state splats with DELAY_DEQUEUE

From: K Prateek Nayak
Date: Thu Oct 10 2024 - 04:46:47 EST


After the introduction of DELAY_DEQUEUE, PSI consistently started
warning about inconsistent task state early into the boot. This could be
root-caused to three issues that the three patches respectively solve:

o PSI signals not being dequeued when the task is blocked, but also
delayed since psi_sched_switch() considered "!task_on_rq_queued()" as
the task being blocked but a delayed task will remain queued on the
runqueue until it is picked again and goes through a full dequeue.

o enqueue_task() not using the ENQUEUE_WAKEUP alongside ENQUEUE_DELAYED
in ttwu_runnable(). Since psi_enqueue() only considers (in terms of
enqueue flags):

(flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED)

... as a wakeup, the lack of ENQUEUE_WAKEUP can misguide psi_enqueue()
which only clears TSK_IOWAIT flag on wakeups.

o When a delayed task is migrated by the load balancer, the requeue or
the wakeup context may be aware that the task has migrated between it
blocking and it waking up. This is necessary to be communicated to PSI
which forgoes clearing TSK_IOWAIT since it expects the psi_.*dequeue()
to have cleared it during migration.

The series correctly communicates the blocked status of a delayed task
to psi_dequeue(), adds the ENQUEUE_WAKEUP flag during a requeue in
ttwu_runnable(), re-arranges the psi_enqueue() to be called after a
"p->sched_class->enqueue_task()", and notify psi_enqueue() of a
migration in delayed state using "p->migration_flags" to maintain the
task state consistently.

This series was previously posted as one large diff at
https://lore.kernel.org/lkml/f82def74-a64a-4a05-c8d4-4eeb3e03d0c0@xxxxxxx/
and was tested by Johannes. The tags on the diff have been carried
to this series.

This series is based on:

git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core

at commit 7266f0a6d3bb ("fs/bcachefs: Fix
__wait_on_freeing_inode() definition of waitqueue entry")

Any and all feedback is greatly appreciated.

--
K Prateek Nayak (2):
sched/core: Add ENQUEUE_WAKEUP flag alongside ENQUEUE_DELAYED
sched/core: Indicate a sched_delayed task was migrated before wakeup

Peter Zijlstra (1):
sched/core: Dequeue PSI signals for blocked tasks that are delayed

kernel/sched/core.c | 25 ++++++++++++++++++++++---
kernel/sched/sched.h | 1 +
kernel/sched/stats.h | 10 ++++++++++
3 files changed, 33 insertions(+), 3 deletions(-)

--
2.34.1