Re: [PATCH 0/3] sched/core: Fix PSI inconsistent task state splats with DELAY_DEQUEUE

From: Peter Zijlstra
Date: Thu Oct 10 2024 - 06:47:38 EST


On Thu, Oct 10, 2024 at 08:28:35AM +0000, K Prateek Nayak wrote:
> After the introduction of DELAY_DEQUEUE, PSI consistently started
> warning about inconsistent task state early into the boot. This could be
> root-caused to three issues that the three patches respectively solve:
>
> o PSI signals not being dequeued when the task is blocked, but also
> delayed since psi_sched_switch() considered "!task_on_rq_queued()" as
> the task being blocked but a delayed task will remain queued on the
> runqueue until it is picked again and goes through a full dequeue.
>
> o enqueue_task() not using the ENQUEUE_WAKEUP alongside ENQUEUE_DELAYED
> in ttwu_runnable(). Since psi_enqueue() only considers (in terms of
> enqueue flags):
>
> (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED)
>
> ... as a wakeup, the lack of ENQUEUE_WAKEUP can misguide psi_enqueue()
> which only clears TSK_IOWAIT flag on wakeups.
>
> o When a delayed task is migrated by the load balancer, the requeue or
> the wakeup context may be aware that the task has migrated between it
> blocking and it waking up. This is necessary to be communicated to PSI
> which forgoes clearing TSK_IOWAIT since it expects the psi_.*dequeue()
> to have cleared it during migration.
>
> The series correctly communicates the blocked status of a delayed task
> to psi_dequeue(), adds the ENQUEUE_WAKEUP flag during a requeue in
> ttwu_runnable(), re-arranges the psi_enqueue() to be called after a
> "p->sched_class->enqueue_task()", and notify psi_enqueue() of a
> migration in delayed state using "p->migration_flags" to maintain the
> task state consistently.
>
> This series was previously posted as one large diff at
> https://lore.kernel.org/lkml/f82def74-a64a-4a05-c8d4-4eeb3e03d0c0@xxxxxxx/
> and was tested by Johannes. The tags on the diff have been carried
> to this series.

Thanks!

I've renamed DELAYED_MIGRATED to MF_DELAYED, and made a note to go
rename the MDF_PUSH thing to something consistent.

I've stuck then in queue.git sched/urgent along with a few other fixes
and I will hopefully push the lot into tip soon.