Re: [PATCH 3/3] sched/core: Indicate a sched_delayed task was migrated before wakeup

From: K Prateek Nayak
Date: Thu Oct 10 2024 - 23:31:42 EST


Hello Johannes,

On 10/11/2024 1:07 AM, Johannes Weiner wrote:
On Thu, Oct 10, 2024 at 03:06:21PM +0200, Peter Zijlstra wrote:
On Thu, Oct 10, 2024 at 09:03:16AM -0400, Johannes Weiner wrote:

I'll try to come up with a suitable solution as well, please don't
apply this one for now.

I'll make sure it doesn't end up in tip as-is.

Thanks.

This would be a replacement patch for #2 and #3 that handles migration
of delayed tasks. It's slightly more invasive on the psi callback
side, but I think it keeps the sched core bits simpler. Thoughts?

---

From d72a665d7c7c7d9c806424f473d13452754471d3 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@xxxxxxxxxxx>
Date: Thu, 10 Oct 2024 14:37:43 -0400
Subject: [PATCH] sched: psi: handle delayed-dequeue task migration

Since sched_delayed tasks remain queued even after blocking, the load
balancer can migrate them between runqueues while PSI considers them
to be asleep. As a result, it misreads the migration requeue followed
by a wakeup as a double queue:

psi: inconsistent task state! task=... cpu=... psi_flags=4 clear=. set=4

First, call psi_enqueue() after p->sched_class->enqueue_task(). A
wakeup will clear p->se.sched_delayed while a migration will not, so
psi can use that flag to tell them apart.

Then teach psi to migrate any "sleep" state when delayed-dequeue tasks
are being migrated.

Delayed-dequeue tasks can be revived by ttwu_runnable(), which will
call down with a new ENQUEUE_DELAYED. Instead of further complicating
the wakeup conditional in enqueue_task(), identify migration contexts
instead and default to wakeup handling for all other cases.

Debugged-by-and-original-fix-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Closes: https://lore.kernel.org/lkml/20240830123458.3557-1-spasswolf@xxxxxx/
Closes: https://lore.kernel.org/all/cd67fbcd-d659-4822-bb90-7e8fbb40a856@xxxxxxxxxxxxx/
Link: https://lore.kernel.org/lkml/f82def74-a64a-4a05-c8d4-4eeb3e03d0c0@xxxxxxx/
Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>

This approach looks good to me as well! Thank you. I added this on top
of Patch 1 and I haven't seen any PSI splats after my stress test. Feel
free to add:

Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>

--
Thanks and Regards,
Prateek

---
kernel/sched/core.c | 12 +++++------
kernel/sched/stats.h | 48 ++++++++++++++++++++++++++++++--------------
2 files changed, 39 insertions(+), 21 deletions(-)

[..snip..]