Re: [PATCH v7] sched: fix init NOHZ_IDLE flag

From: Vincent Guittot
Date: Tue Apr 23 2013 - 03:53:08 EST


On 22 April 2013 22:10, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, 2013-04-22 at 17:38 +0200, Vincent Guittot wrote:
>
>> ---
>> include/linux/sched.h | 1 +
>> kernel/sched/fair.c | 34 ++++++++++++++++++++++++----------
>> 2 files changed, 25 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index d35d2b6..cde4f7f 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -899,6 +899,7 @@ struct sched_domain {
>> unsigned int wake_idx;
>> unsigned int forkexec_idx;
>> unsigned int smt_gain;
>> + unsigned long nohz_flags; /* NOHZ_IDLE flag status */
>> int flags; /* See SD_* */
>> int level;
>
> There was a 4 byte hole here, I'm assuming you used unsigned long and
> not utilized the hole because of the whole atomic bitop interface
> taking unsigned long?
>
> Bit wasteful but ok..
>
> So we're only pulling NOHZ_IDLE out of nohz_flags()? Should we then not
> also amend the rq_nohz_flag_bits enum? And it seems pointless to me to
> set bit 2 our nohz_flags word if all other bits are unused.

Yes, i have been too quick to update the patchset. i'm going to remove
NOHZ_IDLE from the rq_nohz_flag_bits

>
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 7a33e59..09e440f 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5394,14 +5394,21 @@ static inline void set_cpu_sd_state_busy(void)
>> {
>> struct sched_domain *sd;
>> int cpu = smp_processor_id();
>> -
>> - if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
>> - return;
>> - clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>> + int first_nohz_idle = 1;
>>
>> rcu_read_lock();
>> - for_each_domain(cpu, sd)
>> + for_each_domain(cpu, sd) {
>> + if (first_nohz_idle) {
>> + if (!test_bit(NOHZ_IDLE, &sd->nohz_flags))
>> + goto unlock;
>> +
>> + clear_bit(NOHZ_IDLE, &sd->nohz_flags);
>> + first_nohz_idle = 0;
>> + }
>> +
>> atomic_inc(&sd->groups->sgp->nr_busy_cpus);
>> + }
>
> the mind boggles... what's wrong with something like:
>
> static inline unsigned long *rq_nohz_flags(int cpu)
> {
> return rcu_dereference(cpu_rq(cpu)->sd)->nohz_flags;
> }
>
> if (!test_bit(0, rq_nohz_flags(cpu)))
> return;
> clear_bit(0, rq_nohz_flags(cpu));
>

AFAICT, if we use different rcu_dereferences for modifying nohz_flags
and for updating the nr_busy_cpus, we open a time window for
desynchronization, isn't it ?
That why i'm doing the update of rq_nohz_flags with the same
rcu_dereference than nr_busy_cpus

>> +unlock:
>> rcu_read_unlock();
>> }
>>
>> @@ -5409,14 +5416,21 @@ void set_cpu_sd_state_idle(void)
>> {
>> struct sched_domain *sd;
>> int cpu = smp_processor_id();
>> -
>> - if (test_bit(NOHZ_IDLE, nohz_flags(cpu)))
>> - return;
>> - set_bit(NOHZ_IDLE, nohz_flags(cpu));
>> + int first_nohz_idle = 1;
>>
>> rcu_read_lock();
>> - for_each_domain(cpu, sd)
>> + for_each_domain(cpu, sd) {
>> + if (first_nohz_idle) {
>> + if (test_bit(NOHZ_IDLE, &sd->nohz_flags))
>> + goto unlock;
>> +
>> + set_bit(NOHZ_IDLE, &sd->nohz_flags);
>> + first_nohz_idle = 0;
>> + }
>> +
>> atomic_dec(&sd->groups->sgp->nr_busy_cpus);
>> + }
>> +unlock:
>
> Same here, .. why on earth do it for every sched_domain for that cpu?

The update of rq_nohz_flags is done only once on the top level
sched_domain and the nr_busy_cpus is updated on each sched_domain
level in order to have a quick access to the number of busy cpu when
we check for the need to kick an idle load balance

>
>> rcu_read_unlock();
>> }
>>
>
> And since its now only a single bit in a single word, we can easily
> change it to something like:
>
> if (*rq_nohz_idle(cpu))
> return;
> xchg(rq_nohz_idle(cpu), 1); /* set; use 0 for clear */
>
> which drops the unsigned long nonsense..

that's fair. i'm going to use xchg instead of atomic

>
> Or am I completely missing something obvious here?
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/