Re: [PATCH v4 1/4] sched/fair: Introduce primitives for CFS bandwidth burst

From: changhuaixin
Date: Wed Mar 17 2021 - 03:15:11 EST




> On Mar 16, 2021, at 5:54 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Mar 16, 2021 at 12:49:28PM +0800, Huaixin Chang wrote:
>> @@ -8982,6 +8983,12 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
>> if (quota != RUNTIME_INF && quota > max_cfs_runtime)
>> return -EINVAL;
>>
>> + /*
>> + * Bound burst to defend burst against overflow during bandwidth shift.
>> + */
>> + if (burst > max_cfs_runtime)
>> + return -EINVAL;
>
> Why do you allow such a large burst? I would expect something like:
>
> if (burst > quote)
> return -EINVAL;
>
> That limits the variance in the system. Allowing super long bursts seems
> to defeat the entire purpose of bandwidth control.

I understand your concern. Surely large burst value might allow super long bursts
thus preventing bandwidth control entirely for a long time.

However, I am afraid it is hard to decide what the maximum burst should be from
the bandwidth control mechanism itself. Allowing some burst to the maximum of
quota is helpful, but not enough. There are cases where workloads are bursty
that they need many times more than quota in a single period. In such cases, limiting
burst to the maximum of quota fails to meet the needs.

Thus, I wonder whether is it acceptable to leave the maximum burst to users. If the desired
behavior is to allow some burst, configure burst accordingly. If that is causing variance, use share
or other fairness mechanism. And if fairness mechanism still fails to coordinate, do not use
burst maybe.

In this way, cfs_b->buffer can be removed while cfs_b->max_overrun is still needed maybe.