On Tue, Oct 08, 2024 at 09:54:52PM +0530, K Prateek Nayak wrote:
From 2e15180e18b51e9a2bc0d7050e915a70d2673a06 Mon Sep 17 00:00:00 2001
From: K Prateek Nayak <kprateek.nayak@xxxxxxx>
Date: Fri, 4 Oct 2024 15:24:35 +0000
Subject: [RFC PATCH] sched/psi: Fixup PSI accounting with DELAY_DEQUEUE
After the merge of DELAY_DEQUEUE, "psi: inconsistent task state: warning
were seen early into the boot. The crux of the matter is the fact that
when a task is delayed, and the delayed task is then migrated, the
wakeup context may not have any idea that the task was moved from its
previous runqueue. This is the same reason psi_enqueue() considers
only ...
(flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED)
... as a wakeup. In case of a wakeup with migration, PSI forgoes
clearing the TSK_IOWAIT flag which seems to be the issue I encountered
in my splat previously.
With that said, the below diff, based on Peter's original approach
currently seems to work for me in the sense that I have not seen the
inconsistent state warning for a while now with my stress test.
Two key points of the approach are:
o It uses "p->migration_flags" to indicate a delayed entity has
migrated to another runqueue and convey the same during psi_enqueue().
o It adds ENQUEUE_WAKEUP flag alongside ENQUEUE_DELAYED for
enqueue_task() in ttwu_runnable() since psi_enqueue() needs to know of
a wakeup without migration to clear the TSK_IOWAIT flag it would have
set during psi_task_switch() for blocking task and going down the
stack for enqueue_task_fair(), there seem to be no other observer of
the ENQUEUE_WAKEUP flag other than psi_enqueue() in the requeue path.
Suggested-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Signed-off-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
Tested-by: Johannes Weiner <hannes@xxxxxxxxxxx>
It fixes the warning and bogus pressure values after stressing it for
an hour or so with tons of cpu contention and cgroup movements.