Re: [PATCH 3/4] sched/numa: Stop comparing tasks for NUMA placement after selecting an idle core
From: Mel Gorman
Date: Fri Sep 07 2018 - 10:20:07 EST
On Fri, Sep 07, 2018 at 06:35:53PM +0530, Srikar Dronamraju wrote:
> * Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> [2018-09-07 11:11:38]:
>
> > task_numa_migrate is responsible for finding a core on a preferred NUMA
> > node for a task. As part of this, task_numa_find_cpu iterates through
> > the CPUs of a node and evaulates CPUs, both idle and with running tasks,
> > as placement candidates. Generally though, any idle CPU is equivalent in
> > terms of improving imbalances and a search after finding one is pointless.
> > This patch stops examining CPUs on a node if an idle CPU is considered
> > suitable.
> >
>
> However there can be a thread on the destination node that might benefit
> from swapping with the current thread. Don't we loose that opportunity to
> swap if skip checking for other threads?
>
> To articulate.
> Thread A currently running on node 0 wants to move to node 1.
> Thread B currently running on node 1 is better of if it ran on node 0.
>
> Thread A seems idle cpu before seeing Thread B; skips and looses
> an opportunity to swap.
>
> Eventually thread B will get an opportunity to move to node 0, when thread B
> calls task_numa_placement but we are probably stopping it from achieving
> earlier.
>
Potentially this opportunity is missed but I think the only case where
swapping is better than an idle CPU is when both tasks are not running
on their preferred node. For that to happen, it would likely require that
the machine be heavily saturated (or both would just find idle cores). I
would think that's the rare case and it's better just to save the cycles
searching through runqueues and examining tasks and just take the idle
CPU. Furthermore, swapping is guaranteed to disrupt two tasks as they
have to be dequeued, migrated and requeued for what may or may not be an
overall performance gain. Lastly, even if it's the case that there is a
swap candidate out there, that does not justify calling select_idle_sibling
for every idle CPU encountered which is what happens currently.
I think the patch I have is almost certain a win (reduced search costs)
and continuing the search just in case there is a good swap candidate
out there is often going to cost more than it saves.
--
Mel Gorman
SUSE Labs