Re: [PATCH 00/10] steal tasks to improve CPU utilization

From: Steven Sistare
Date: Thu Nov 01 2018 - 07:57:26 EST


On 10/22/2018 10:59 AM, Steve Sistare wrote:
> When a CPU has no more CFS tasks to run, and idle_balance() fails to
> find a task, then attempt to steal a task from an overloaded CPU in the
> same LLC. Maintain and use a bitmap of overloaded CPUs to efficiently
> identify candidates. To minimize search time, steal the first migratable
> task that is found when the bitmap is traversed. For fairness, search
> for migratable tasks on an overloaded CPU in order of next to run.
> [...]
> Steve Sistare (10):
> sched: Provide sparsemask, a reduced contention bitmap
> sched/topology: Provide hooks to allocate data shared per LLC
> sched/topology: Provide cfs_overload_cpus bitmap
> sched/fair: Dynamically update cfs_overload_cpus
> sched/fair: Hoist idle_stamp up from idle_balance
> sched/fair: Generalize the detach_task interface
> sched/fair: Provide can_migrate_task_llc
> sched/fair: Steal work from an overloaded CPU when CPU goes idle
> sched/fair: disable stealing if too many NUMA nodes
> sched/fair: Provide idle search schedstats

(resend, reformatted)

Thanks very much to everyone who has commented on my patch series.
Here are the issues to be addressed in V2 of the series, and the person
that suggested it, or raised the issue that led to it.

Changes for V2:
* Remove stray patch 10 hunk from patch 5 (Valentin)
* Fix "warning: label out defined but not used" for !CONFIG_SCHED_SMT
(Valentin)
* Set SCHED_STEAL_NODE_LIMIT_DEFAULT to 2 (Steve)
* Call try_steal iff avg_idle exceeds some small threshold (Steve, Valentin)

Possible future work:
* Use sparsemask and stealing for RT (Steve, Peter)
* Remove the core and socket levels from idle_balance() and let stealing
handle those levels (Steve, Peter)
* Delete idle_balance() and use stealing exclusively for handling new idle
(Steve, Peter)
* Test specjbb multi-warehouse on 8-node systems when stealing for
large NUMA systems is revisited (Peter)
* Enhance stealing to handle misfits (Valentin)
* Lower time threshold for task_hot within LLC (Valentin)

Dropped:
* Skip try_steal() if we bail out of idle_balance() because !this_rq->rd->overload
(Valentin)
I tried it and saw no difference. Dropped for simplicity.

Does anyone else plan to review the code? Please tell me now, even if your
review will be delayed. If yes, I will wait for all comments before producing
V2. The code changes so far are small.

- Steve