Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.

From: Rakib Mullick
Date: Tue Aug 28 2012 - 02:57:46 EST


Hello Paul,

On 8/28/12, Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Aug 20, 2012 at 09:26:57AM -0700, Paul E. McKenney wrote:
>> On Mon, Aug 20, 2012 at 11:26:57AM +0200, Peter Zijlstra wrote:
>
> How about the following updated patch?
>
Actually, I was waiting for Peter's update.

> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> sched: Fix load avg vs cpu-hotplug
>
> Rabik and Paul reported two different issues related to the same few
> lines of code.
>
> Rabik's issue is that the nr_uninterruptible migration code is wrong in
> that he sees artifacts due to this (Rabik please do expand in more
> detail).
>
> Paul's issue is that this code as it stands relies on us using
> stop_machine() for unplug, we all would like to remove this assumption
> so that eventually we can remove this stop_machine() usage altogether.
>
> The only reason we'd have to migrate nr_uninterruptible is so that we
> could use for_each_online_cpu() loops in favour of
> for_each_possible_cpu() loops, however since nr_uninterruptible() is the
> only such loop and its using possible lets not bother at all.
>
> The problem Rabik sees is (probably) caused by the fact that by
> migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
> involved.
>
> So don't bother with fancy migration schemes (meaning we now have to
> keep using for_each_possible_cpu()) and instead fold any nr_active delta
> after we migrate all tasks away to make sure we don't have any skewed
> nr_active accounting.
>
> [ paulmck: Move call to calc_load_migration to CPU_DEAD to avoid
> miscounting noted by Rakib. ]
>
> Reported-by: Rakib Mullick <rakib.mullick@xxxxxxxxx>
> Reported-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paul.mckenney@xxxxxxxxxx>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e841dfc..a8807f2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5309,27 +5309,17 @@ void idle_task_exit(void)
> }
>
> /*
> - * While a dead CPU has no uninterruptible tasks queued at this point,
> - * it might still have a nonzero ->nr_uninterruptible counter, because
> - * for performance reasons the counter is not stricly tracking tasks to
> - * their home CPUs. So we just add the counter to another CPU's counter,
> - * to keep the global sum constant after CPU-down:
> - */
> -static void migrate_nr_uninterruptible(struct rq *rq_src)
> -{
> - struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
> -
> - rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
> - rq_src->nr_uninterruptible = 0;
> -}
> -
> -/*
> - * remove the tasks which were accounted by rq from calc_load_tasks.
> + * Since this CPU is going 'away' for a while, fold any nr_active delta
> + * we might have. Assumes we're called after migrate_tasks() so that the
> + * nr_active count is stable.
> + *
> + * Also see the comment "Global load-average calculations".
> */
> -static void calc_global_load_remove(struct rq *rq)
> +static void calc_load_migrate(struct rq *rq)
> {
> - atomic_long_sub(rq->calc_load_active, &calc_load_tasks);
> - rq->calc_load_active = 0;
> + long delta = calc_load_fold_active(rq);
> + if (delta)
> + atomic_long_add(delta, &calc_load_tasks);
> }
>
> /*
> @@ -5622,9 +5612,18 @@ migration_call(struct notifier_block *nfb, unsigned
> long action, void *hcpu)
> migrate_tasks(cpu);
> BUG_ON(rq->nr_running != 1); /* the migration thread */
> raw_spin_unlock_irqrestore(&rq->lock, flags);
> + break;
>
> - migrate_nr_uninterruptible(rq);
> - calc_global_load_remove(rq);
> + case CPU_DEAD:
> + {
> + struct rq *dest_rq;
> +
> + local_irq_save(flags);
> + dest_rq = cpu_rq(smp_processor_id());

Use of smp_processor_id() as dest cpu isn't clear to me, this
processor is about to get down, isn't it?

> + raw_spin_lock(&dest_rq->lock);
> + calc_load_migrate(rq);

Well, calc_load_migrate() has no impact cause rq->nr_running == 1 at
this point. It's been already pointed out previously.

Thanks,
Rakib
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/