Re: [patch 09/16] sched: unthrottle cfs_rq(s) who ran out of quota atperiod refresh

From: Paul Turner
Date: Tue Jun 28 2011 - 00:46:54 EST


On Wed, Jun 22, 2011 at 10:29 AM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
> On Tue, 2011-06-21 at 00:16 -0700, Paul Turner wrote:
>>  static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
>>  {
>> -       int idle = 1;
>> +       int idle = 1, throttled = 0;
>> +       u64 runtime, runtime_expires;
>> +
>>
>>         raw_spin_lock(&cfs_b->lock);
>>         if (cfs_b->quota != RUNTIME_INF) {
>> -               idle = cfs_b->idle;
>> -               /* If we're going idle then defer handle the refill */
>> +               /* idle depends on !throttled in the case of a large deficit */
>> +               throttled = !list_empty(&cfs_b->throttled_cfs_rq);
>> +               idle = cfs_b->idle && !throttled;
>> +
>> +               /* If we're going idle then defer the refill */
>>                 if (!idle)
>>                         __refill_cfs_bandwidth_runtime(cfs_b);
>> +               if (throttled) {
>> +                       runtime = cfs_b->runtime;
>> +                       runtime_expires = cfs_b->runtime_expires;
>> +
>> +                       /* we must first distribute to throttled entities */
>> +                       cfs_b->runtime = 0;
>> +               }
>
> Why, whats so bad about letting someone take some concurrently and not
> getting throttled meanwhile? Starvation considerations? If so, that
> wants mentioning.

Yes -- we also particularly want to pay down all deficits first in
case someone has accumulated a *large* arrears (e.g. !CONFIG_PREEMPT).

Will expand the comment here.

>
>>
>>                 /*
>> -                * mark this bandwidth pool as idle so that we may deactivate
>> -                * the timer at the next expiration if there is no usage.
>> +                * conditionally mark this bandwidth pool as idle so that we may
>> +                * deactivate the timer at the next expiration if there is no
>> +                * usage.
>>                  */
>> -               cfs_b->idle = 1;
>> +               cfs_b->idle = !throttled;
>>         }
>>
>> -       if (idle)
>> +       if (idle) {
>>                 cfs_b->timer_active = 0;
>> +               goto out_unlock;
>> +       }
>> +       raw_spin_unlock(&cfs_b->lock);
>> +
>> +retry:
>> +       runtime = distribute_cfs_runtime(cfs_b, runtime, runtime_expires);
>> +
>> +       raw_spin_lock(&cfs_b->lock);
>> +       /* new bandwidth specification may exist */
>> +       if (unlikely(runtime_expires != cfs_b->runtime_expires))
>> +               goto out_unlock;
>
> it might help to explain how, runtime_expires is taken from cfs_b after
> calling __refill_cfs_bandwidth_runtime, and we're in the replenishment
> timer, so nobody is going to be adding new runtime.
>

Good idea -- thanks

>> +       /* ensure no-one was throttled while we unthrottling */
>> +       if (unlikely(!list_empty(&cfs_b->throttled_cfs_rq)) && runtime > 0) {
>> +               raw_spin_unlock(&cfs_b->lock);
>> +               goto retry;
>> +       }
>
> OK, I can see that.
>
>> +
>> +       /* return remaining runtime */
>> +       cfs_b->runtime = runtime;
>> +out_unlock:
>>         raw_spin_unlock(&cfs_b->lock);
>>
>>         return idle;
>
> This function hurts my brain, code flow is horrid.

Yeah.. I don't know why I didn't just make it a while loop, will fix.

>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/