Re: [PATCH 17/24] sched/fair: Implement delayed dequeue

From: K Prateek Nayak
Date: Mon Nov 04 2024 - 23:24:42 EST


Hello Mike,

On 11/5/2024 9:35 AM, Mike Galbraith wrote:
On Mon, 2024-11-04 at 08:05 -0500, Phil Auld wrote:
On Sat, Nov 02, 2024 at 05:32:14AM +0100 Mike Galbraith wrote:


The buddy being preempted certainly won't be wakeup migrated...

Not the waker who gets preempted but the wakee may be a bit more
sticky on his current cpu and thus stack more since he's still
in that runqueue.

Ah, indeed, if wakees don't get scraped off before being awakened, they
can and do miss chances at an idle CPU according to trace_printk().

I'm undecided if overall it's boon, bane or even matters, as there is
still an ample supply of wakeup migration, but seems it can indeed
inject wakeup latency needlessly, so <sharpens stick>...

I had tried this out a while back but I was indiscriminately doing a
DEQUEUE_DELAYED and letting delayed tasks go through a full ttwu cycle
which did not yield any improvements on hackbench. Your approach to
selectively do it might indeed be better (more thoughts below)


My box booted and neither become exceptionally noisy nor inexplicably
silent in.. oh, minutes now, so surely yours will be perfectly fine.

After one minute of lightly loaded box browsing, trace_printk() said:

645 - racy peek says there is a room available
11 - cool, reserved room is free
206 - no vacancy or wakee pinned
38807 - SIS accommodates room seeker

The below should improve the odds, but high return seems unlikely.

---
kernel/sched/core.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3790,7 +3790,13 @@ static int ttwu_runnable(struct task_str
rq = __task_rq_lock(p, &rf);
if (task_on_rq_queued(p)) {
update_rq_clock(rq);
- if (p->se.sched_delayed)
+ /*
+ * If wakee is mobile and the room it reserved is occupied, let it try to migrate.
+ */
+ if (p->se.sched_delayed && rq->nr_running > 1 && cpumask_weight(p->cpus_ptr) > 1) {

Would checking "p->nr_cpus_allowed > 1" be enough instead of doing a
"cpumask_weight(p->cpus_ptr) > 1"?

I was thinking, since the task is indeed delayed, there has to be more
than one task on the runqueue right since a single task by itself cannot
be ineligible and be marked for delayed dequeue? The only time we
encounter a delayed task with "rq->nr_running == 1" is if the other
tasks have been fully dequeued and pick_next_task() is in the process of
picking off all the delayed task, but since that is done with the rq
lock held in schedule(), it is even possible for the
"rq->nr_running > 1" to be false here?

+ dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK);
+ goto out_unlock;
+ } else if (p->se.sched_delayed)
enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
if (!task_on_cpu(rq, p)) {
/*
@@ -3802,6 +3808,7 @@ static int ttwu_runnable(struct task_str
ttwu_do_wakeup(p);
ret = 1;
}
+out_unlock:
__task_rq_unlock(rq, &rf);

return ret;



--
Thanks and Regards,
Prateek