Re: [patch 06/16] sched: account for blocked load waking back up

From: Benjamin Segall
Date: Tue Sep 04 2012 - 13:29:37 EST


Preeti Murthy <preeti.lkml@xxxxxxxxx> writes:

> Hi Paul,
>
> @@ -1170,20 +1178,42 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â struct sched_entity *se,
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â int wakeup)
> Â{
> - Â Â Â /* we track migrations using entity decay_count == 0 */
> - Â Â Â if (unlikely(!se->avg.decay_count)) {
> + Â Â Â /*
> + Â Â Â Â* We track migrations using entity decay_count <= 0, on a wake-up
> + Â Â Â Â* migration we use a negative decay count to track the remote decays
> + Â Â Â Â* accumulated while sleeping.
> + Â Â Â Â*/
> + Â Â Â if (unlikely(se->avg.decay_count <= 0)) {
> Â Â Â Â Â Â Â Â se->avg.last_runnable_update = rq_of(cfs_rq)->clock_task;
> + Â Â Â Â Â Â Â if (se->avg.decay_count) {
> + Â Â Â Â Â Â Â Â Â Â Â /*
> + Â Â Â Â Â Â Â Â Â Â Â Â* In a wake-up migration we have to approximate the
> + Â Â Â Â Â Â Â Â Â Â Â Â* time sleeping. ÂThis is because we can't synchronize
> + Â Â Â Â Â Â Â Â Â Â Â Â* clock_task between the two cpus, and it is not
> + Â Â Â Â Â Â Â Â Â Â Â Â* guaranteed to be read-safe. ÂInstead, we can
> + Â Â Â Â Â Â Â Â Â Â Â Â* approximate this using our carried decays, which are
> + Â Â Â Â Â Â Â Â Â Â Â Â* explicitly atomically readable.
> + Â Â Â Â Â Â Â Â Â Â Â Â*/
> + Â Â Â Â Â Â Â Â Â Â Â se->avg.last_runnable_update -= (-se->avg.decay_count)
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â << 20;
> + Â Â Â Â Â Â Â Â Â Â Â update_entity_load_avg(se, 0);
> + Â Â Â Â Â Â Â Â Â Â Â /* Indicate that we're now synchronized and on-rq */
> + Â Â Â Â Â Â Â Â Â Â Â se->avg.decay_count = 0;
> + Â Â Â Â Â Â Â }
> Â Â Â Â Â Â Â Â wakeup = 0;
> Â Â Â Â } else {
> Â Â Â Â Â Â Â Â __synchronize_entity_decay(se);
>
> Â
> Should not the last_runnable_update of se get updated in __synchronize_entity_decay()?
> Because it contains the value of the runnable update before going to sleep.If not updated,when
> update_entity_load_avg() is called below during a local wakeup,it will decay the runtime load
> for the duration including the time the sched entity has slept.

If you are asking if it should be updated in the else block (local
wakeup, no migration) here, no:

* __synchronize_entity_decay will decay load_avg_contrib to match the
decay that the cfs_rq has done, keeping those in sync, and ensuring we
don't subtract too much when we update our current load average.
* clock_task - last_runnable_update will be the amount of time that the
task has been blocked. update_entity_load_avg (below) and
__update_entity_runnable_avg will account this time as non-runnable
time into runnable_avg_sum/period, and from there onto the cfs_rq via
__update_entity_load_avg_contrib.

Both of these are necessary, and will happen. In the case of !wakeup,
the task is being moved between groups or is migrating between cpus, and
we pretend (to the best of our ability in the case of migrating between
cpus which may have different clock_tasks) that the task has been
runnable this entire time.

In the more general case, no, it is called from migrate_task_rq_fair,
which doesn't have the necessary locks to read clock_task.

>
> This also means that during dequeue_entity_load_avg(),update_entity_load_avg() needs to be
> called to keep the runnable_avg_sum of the sched entity updated till
> before sleep.

Yes, this happens first thing in dequeue_entity_load_avg.
>
> Â Â Â Â }
>
> - Â Â Â if (wakeup)
> + Â Â Â /* migrated tasks did not contribute to our blocked load */
> + Â Â Â if (wakeup) {
> Â Â Â Â Â Â Â Â subtract_blocked_load_contrib(cfs_rq, se->avg.load_avg_contrib);
> + Â Â Â Â Â Â Â update_entity_load_avg(se, 0);
> + Â Â Â }
>
> - Â Â Â update_entity_load_avg(se, 0);
> Â Â Â Â cfs_rq->runnable_load_avg += se->avg.load_avg_contrib;
> - Â Â Â update_cfs_rq_blocked_load(cfs_rq);
> + Â Â Â /* we force update consideration on load-balancer moves */
> + Â Â Â update_cfs_rq_blocked_load(cfs_rq, !wakeup);
> Â}
>
> Â --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at Âhttp://vger.kernel.org/majordomo-info.html
> Please read the FAQ at Âhttp://www.tux.org/lkml/
>
> Regards
> Preeti
¢éì®&Þ~º&¶¬–+-±éÝ¥Šw®žË±Êâmébžìdz¹Þ)í…æèw*jg¬±¨¶‰šŽŠÝj/êäz¹ÞŠà2ŠÞ¨è­Ú&¢)ß«a¶Úþø®G«éh®æj:+v‰¨Šwè†Ù>Wš±êÞiÛaxPjØm¶Ÿÿà -»+ƒùdš_