Re: [PATCH] sched: Improve rq selection for a blocked task when its affinity changes

From: Ze Gao
Date: Sun Apr 14 2024 - 22:03:51 EST


On Sat, Apr 13, 2024 at 12:59 AM Madadi Vineeth Reddy
<vineethr@xxxxxxxxxxxxx> wrote:
>
> Hi Ze Gao,
>
> On 13/03/24 14:28, Ze Gao wrote:
> > We observered select_idle_sibling() is likely to return the *target* cpu
> > early which is likely to be the previous cpu this task is running on even
> > when it's actually not within the affinity list newly set, from where after
> > we can only rely on select_fallback_rq() to choose one for us at its will
> > (the first valid mostly for now).
> >
> > However, the one chosen by select_fallback_rq() is highly likely not a
> > good enough candidate, sometimes it has to rely on load balancer to kick
> > in to place itself to a better cpu, which adds one or more unnecessary
> > migrations in no doubt. For example, this is what I get when I move task
> > 3964 to cpu 23-24 where cpu 23 has a cpu bound work pinned already:
> >
> > swapper 0 [013] 959.791829: sched:sched_migrate_task: comm=stress-ng-cpu pid=3964 prio=120 orig_cpu=13 dest_cpu=23
> > kworker/24:2-mm 1014 [024] 959.806148: sched:sched_migrate_task: comm=stress-ng-cpu pid=3964 prio=120 orig_cpu=23 dest_cpu=24
> >
>
> I am able to reproduce this scenario of having an extra migration through load balance
> swapper 0 [031] 398764.057232: sched:sched_migrate_task: comm=loop pid=178687 prio=120 orig_cpu=31 dest_cpu=33
> ksoftirqd/0 13 [000] 398764.356138: sched:sched_migrate_task: comm=loop pid=178687 prio=120 orig_cpu=33 dest_cpu=34
>
> I wrote a simple c program that blocks for few seconds, meanwhile I taskset it to CPUs 33,34 while I already have a
> busy task running on CPU 33.
>
> > The thing is we can actually do better if we do checks early and take more
> > advantages of the *target* in select_idle_sibling(). That is, we continue
> > the idle cpu selection if *target* fails the test of cpumask_test_cpu(
> > *target*, p->cpus_ptr). By doing so, we are likely to pick a good candidate,
> > especially when the newly allowed cpu set shares some cpu resources with
> > *target*.
> >
> > And with this change, we clearly see the improvement when I move task 3964
> > to cpu 25-26 where cpu 25 has a cpu bound work pinned already.
> >
> > swapper 0 [027] 4249.204658: sched:sched_migrate_task: comm=stress-ng-cpu pid=3964 prio=120 orig_cpu=27 dest_cpu=26
>
> But after applying this patch, The extra migration is still happening as CPU 33 is still chosen by try_to_wake_up.
>
> On placing some perf probes and testing,
> migration/57 304 [057] 12216.988491: sched:sched_migrate_task: comm=loop pid=11172 prio=120 orig_cpu=57 dest_cpu=4
> swapper 0 [004] 12226.989065: probe:select_idle_sibling_L124: (c0000000001bafc0) i=-1 recent_used_cpu=-1 prev_aff=-1
> swapper 0 [004] 12226.989071: probe:select_fallback_rq: (c0000000001a2e38) cpu=4
> swapper 0 [004] 12226.989074: sched:sched_migrate_task: comm=loop pid=11172 prio=120 orig_cpu=4 dest_cpu=33
> swapper 0 [000] 12227.007768: sched:sched_migrate_task: comm=loop pid=11172 prio=120 orig_cpu=33 dest_cpu=34
>
> It is observed that, select_fallback_rq is still taken in this scenario as default target is returned at the end of select_idle_sibling
> which was CPU 4.

My best guess is that you may have hit the code path for running tasks
(taskset happens right after the task is woken up). Should that happen,
the picking is done via:

dest_cpu = cpumask_any_and_distribute(cpu_valid_mask, ctx->new_mask);

and it also makes sense that select_fallback_rq() returns 4 since that happens
before you change the affinities.

you may need to rule out this case first :)

Regards,
Ze

> In most of my testing, default target is returned at the end of the function due to the initial checks. It's possible that there would
> be cases where we can get optimal CPU before we reach end of the select_idle_sibling function but it would be interesting to know if the
> extra time spent in finding an optimal cpu have an impact instead of returning it earlier if in most of the times we are returning the
> default target at the end.
>
> Thanks and Regards
> Madadi Vineeth Reddy
>
> >
> > Note we do the same check for *prev* in select_idle_sibling() as well.
> >
> > Signed-off-by: Ze Gao <zegao@xxxxxxxxxxx>
> > ---
>