Re: [PATCH] sched/fair: don't assign runtime for throttled cfs_rq

From: Liangyan
Date: Mon Aug 26 2019 - 22:45:15 EST

On 19/8/27 äå1:38, bsegall@xxxxxxxxxx wrote:
Valentin Schneider <valentin.schneider@xxxxxxx> writes:

On 23/08/2019 21:00, bsegall@xxxxxxxxxx wrote:
Could you mention in the message that this a throttled cfs_rq can have
account_cfs_rq_runtime called on it because it is throttled before
idle_balance, and the idle_balance calls update_rq_clock to add time
that is accounted to the task.

Mayhaps even a comment for the extra condition.

I think this solution is less risky than unthrottling
in this area, so other than that:

Reviewed-by: Ben Segall <bsegall@xxxxxxxxxx>

If you don't mind squashing this in:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b1d9cec9b1ed..b47b0bcf56bc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4630,6 +4630,10 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b, u64 remaining)
if (!cfs_rq_throttled(cfs_rq))
goto next;
+ /* By the above check, this should never be true */
+ WARN_ON(cfs_rq->runtime_remaining > 0);
+ /* Pick the minimum amount to return to a positive quota state */
runtime = -cfs_rq->runtime_remaining + 1;
if (runtime > remaining)
runtime = remaining;

I'm not adamant about the extra comment, but the WARN_ON would be nice IMO.

@Ben, do you reckon we want to strap

Cc: <stable@xxxxxxxxxxxxxxx>
Fixes: ec12cb7f31e2 ("sched: Accumulate per-cfs_rq cpu usage and charge against bandwidth")

to the thing? AFAICT the pick_next_task_fair() + idle_balance() dance you
described should still be possible on that commit.

I'm not sure about stable policy in general, but it seems reasonable.
The WARN_ON might want to be WARN_ON_ONCE, and it seems fine to have it
or not.

Thanks Ben and Valentin for all of the comments. Per Xunlei's suggestion, I used SCHED_WARN_ON instead in v3. Regarding whether cc stable, I'm also not sure.

Other than that,

Reviewed-by: Valentin Schneider <valentin.schneider@xxxxxxx>