Re: [PATCH 07/10] sched/fair: Provide can_migrate_task_llc
From: Valentin Schneider
Date: Fri Oct 26 2018 - 14:04:13 EST
Hi Steve,
On 22/10/2018 15:59, Steve Sistare wrote:
> Define a simpler version of can_migrate_task called can_migrate_task_llc
> which does not require a struct lb_env argument, and judges whether a
> migration from one CPU to another within the same LLC should be allowed.
>
> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx>
> ---
> kernel/sched/fair.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4acdd8d..6548bed 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7168,6 +7168,34 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> }
>
> /*
> + * Return true if task @p can migrate from @rq to @dst_rq in the same LLC.
> + * No need to test for co-locality, and no need to test task_hot(), as sharing
> + * LLC provides cache warmth at that level.
I was thinking that perhaps we could have scenarios where some rq's
keep stealing tasks off of each other and we end up circulating tasks
between CPUs. Now, that would only happen if we had a handful of tasks
with a very tiny period, and I'm not familiar with (real) such hyperactive
workloads similar to those generated by hackbench where that could happen.
In short, I wonder if we should have task_hot() in there. Drawing a
parallel with load_balance(), even if load-balancing is happening between
rqs of the same LLC, we do go check task_hot(). Have you already experimented
with adding a task_hot() check in here?
I've run some iterations of hackbench (hackbench 2 process 100000) to
investigate this task bouncing, but I didn't really see any of it. That was
just a 4+4 big.LITTLE system though, I'll try to get numbers on a system
with more CPUs.
----->8-----
activations: # of task activations (task starts running)
cpu_migrations: # of activations where cpu != prev_cpu
% stats are percentiles
- STEAL:
| stat | cpu_migrations | activations |
|-------+----------------+-------------|
| count | 2005.000000 | 2005.000000 |
| mean | 16.244888 | 290.608479 |
| std | 38.963138 | 253.003528 |
| min | 0.000000 | 3.000000 |
| 50% | 3.000000 | 239.000000 |
| 75% | 8.000000 | 436.000000 |
| 90% | 45.000000 | 626.000000 |
| 99% | 188.960000 | 1073.000000 |
| max | 369.000000 | 1417.000000 |
- NO_STEAL:
| stat | cpu_migrations | activations |
|-------+----------------+-------------|
| count | 2005.000000 | 2005.000000 |
| mean | 15.260848 | 297.860848 |
| std | 46.331890 | 253.210813 |
| min | 0.000000 | 3.000000 |
| 50% | 3.000000 | 252.000000 |
| 75% | 7.000000 | 444.000000 |
| 90% | 32.600000 | 643.600000 |
| 99% | 214.880000 | 1127.520000 |
| max | 467.000000 | 1547.000000 |
----->8-----
Otherwise, my only other concern at the moment is that since stealing
doesn't care about load, we could steal a task that would cause a big
imbalance, which wouldn't have happened with a call to load_balance().
I don't think this can be triggered with a symmetrical workload like
hackbench, so I'll go explore something else.