Re: [PATCH] sched/fair: Skip wake_affine() for core siblings

From: Kirill Tkhai
Date: Tue Sep 29 2015 - 12:03:39 EST




On 29.09.2015 19:00, Kirill Tkhai wrote:
>
>
> On 29.09.2015 17:55, Mike Galbraith wrote:
>> On Mon, 2015-09-28 at 18:36 +0300, Kirill Tkhai wrote:
>>
>>> ---
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 4df37a4..dfbe06b 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -4930,8 +4930,13 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
>>> int want_affine = 0;
>>> int sync = wake_flags & WF_SYNC;
>>>
>>> - if (sd_flag & SD_BALANCE_WAKE)
>>> - want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));
>>> + if (sd_flag & SD_BALANCE_WAKE) {
>>> + want_affine = 1;
>>> + if (cpu == prev_cpu || !cpumask_test_cpu(cpu, tsk_cpus_allowed(p)))
>>> + goto want_affine;
>>> + if (wake_wide(p))
>>> + goto want_affine;
>>> + }
>>
>> That blew wake_wide() right out of the water.
>>
>> It's not only about things like pgbench. Drive multiple tasks in a Xen
>> guest (single event channel dom0 -> domu, and no select_idle_sibling()
>> to save the day) via network, and watch workers fail to be all they can
>> be because they keep being stacked up on the irq source. Load balancing
>> yanks them apart, next irq stacks them right back up. I met that in
>> enterprise land, thought wake_wide() should cure it, and indeed it did.
>
> 1)Hm.. The patch makes select_task_rq_fair() to prefer old cpu instead of
> current, doesn't it? We more often don't set affine_sd. So, the skipped
> part of patch (skipped in quote) selects prev_cpu.
>
> 2)I thought about waking by irq handler and even was going to ask why
> we use affine logic for such wakeups. Device handlers usually aren't
> bound, timers may migrate since NO_HZ logic presents. The only explanation
> I found is unbound timers is very unlikely case (I added statistics printk
> to my local sched_debug to check that). But if we have the situations like
> you described above, don't we have to disable affine logic for in_interrupt()
> cases?
>
> 3)I ask about just because (being outside of scheduler history) it's a little
> bit strange, we prefer smp_processor_id()'s sd_llc so much. Sync wakeup's
> profit is less or more clear: smp_processor_id()'s sd_llc may contain some
> data, which is interesting for a wakee, and this minimizes cache misses.
> But we do the same in other cases too, and at every migration we loose
> itlb, dtlb... Of course, it requires more accurate patches, then posted

***typo: instruction and data caches

> (not so rude patches).
>
> Thanks,
> Kirill
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/