Re: [RFC PATCH 2/5] sched: Add NOHZ_STATS_KICK
From: Vincent Guittot
Date: Tue Feb 06 2018 - 04:23:01 EST
On 5 February 2018 at 23:18, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
> On 01/30/2018 11:41 AM, Valentin Schneider wrote:
>> [...]
>>> I have studied a way to keep track of how many cpus still have blocked
>>> load to try to minimize the number of useless ilb kick but this add
>>> more atomic operations which can impact the system throughput with
>>> heavy load and lot of very small wake up. that's why i have propose
>>> this solution which is more simple. But it's probably just a matter of
>>> where we want to "waste" time. Either we accept to spent a bit more
>>> time to check the state of idle CPUs or we accept to kick ilb from
>>> time to time for no good reason.
>>>
>>
>> Agreed. I have the feeling that spending more time doing atomic ops could be worth it - I'll try to test this out and see if it's actually relevant.
>>
>
> I gave this a spin, still using Vincent's patches with the included patch on top. Nothing too clever, just seeing how replacing nohz.stats_state with a cpumask would go.
>
> I've replaced nohz.stats_state by nohz.stale_cpus_mask. I kept changes minimal - there are some places where I think nohz.idle_cpus_mask could be substituted by nohz.stale_cpus_mask. Speaking about that, I was about to write a comment regarding the fact that nohz.stale_cpus_mask should be a subset of nohz.idle_cpus_mask, but I realized it's actually not true:
>
> In the current implementation (cpumask or stats_state flag), an idle CPU is defined as having blocked load as soon as it goes through nohz_balance_enter_idle(), and that flag is cleared when we go through _nohz_idle_balance() (and newly idle load balance in my cpumask implementation).
> However we can imagine a scenario where a CPU goes idle, is flagged as having blocked load, then it wakes up and goes through its periodic balance code and updates that load. Yet, the flag (or cpumask) won't have been updated.
> So I think there could be a discussion on whether the flag should be cleared on nohz_balance_exit_idle() if we assume periodic balance now takes care of this. It could cause issues if we have a weird scenario where a CPU keeps going online/idle but never stays online long enough for a tick though.
> Alternatively we could clear that flag when going through the first periodic balance after idling, but then that's a bit weird because we're using a nohz structure in a non-nohz context.
>
>
> Anyway, I tried to get some profiling done with the cpumask but there's something wrong with my setup, I would only get nonsense numbers (for both baseline & my patch), so I added start/end trace_printks to _nohz_idle_balance(). It's ugly, inaccurate and unorthodox but it still gives a rough idea of how the cpumask impacts stuff.
> I ran 20 iterations of my "nohz update" test case (a task accumulates load, goes to sleep, and another always-running task keeps kicking an ILB to decay that blocked load) and the time saved by skipping CPUs is in the ballpark of 20%. Notebook is at [1].
Even if saving time in the ILB is interesting, we have to check the
impact of bitmask operation on the wake up latency of the CPUs when
the system is heavily used and generate a lot of sleep/wakeup on CPUs
>
> I'll try to get a proper function profiling working for when Vincent posts his "v2".
>
> [1]: https://gist.github.com/valschneider/6f203143bee1e149f24c44e9582a9eff
>
> ---