Re: [PATCH 0/3] sched/core: Fix PSI inconsistent task state splats with DELAY_DEQUEUE

From: K Prateek Nayak
Date: Thu Oct 10 2024 - 06:57:42 EST


Hello Peter,

On 10/10/2024 4:17 PM, Peter Zijlstra wrote:
On Thu, Oct 10, 2024 at 08:28:35AM +0000, K Prateek Nayak wrote:
After the introduction of DELAY_DEQUEUE, PSI consistently started
warning about inconsistent task state early into the boot. This could be
root-caused to three issues that the three patches respectively solve:

o PSI signals not being dequeued when the task is blocked, but also
delayed since psi_sched_switch() considered "!task_on_rq_queued()" as
the task being blocked but a delayed task will remain queued on the
runqueue until it is picked again and goes through a full dequeue.

o enqueue_task() not using the ENQUEUE_WAKEUP alongside ENQUEUE_DELAYED
in ttwu_runnable(). Since psi_enqueue() only considers (in terms of
enqueue flags):

(flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED)

... as a wakeup, the lack of ENQUEUE_WAKEUP can misguide psi_enqueue()
which only clears TSK_IOWAIT flag on wakeups.

o When a delayed task is migrated by the load balancer, the requeue or
the wakeup context may be aware that the task has migrated between it
blocking and it waking up. This is necessary to be communicated to PSI
which forgoes clearing TSK_IOWAIT since it expects the psi_.*dequeue()
to have cleared it during migration.

The series correctly communicates the blocked status of a delayed task
to psi_dequeue(), adds the ENQUEUE_WAKEUP flag during a requeue in
ttwu_runnable(), re-arranges the psi_enqueue() to be called after a
"p->sched_class->enqueue_task()", and notify psi_enqueue() of a
migration in delayed state using "p->migration_flags" to maintain the
task state consistently.

This series was previously posted as one large diff at
https://lore.kernel.org/lkml/f82def74-a64a-4a05-c8d4-4eeb3e03d0c0@xxxxxxx/
and was tested by Johannes. The tags on the diff have been carried
to this series.

Thanks!

I've renamed DELAYED_MIGRATED to MF_DELAYED, and made a note to go
rename the MDF_PUSH thing to something consistent.

Ack. DELAYED_MIGRATED was coined in spur of the moment; MF_DELAYED fits
the "migration_flags" bill well.


I've stuck then in queue.git sched/urgent along with a few other fixes
and I will hopefully push the lot into tip soon.

Thank you for picking up the patches! Let me know if there are any
surprises (there shouldn't be any after the thorough testing by
Johannes :) - thank you again for that)

--
Thanks and Regards,
Prateek