Re: [PATCH v3 03/14] sched/core: uclamp: add CPU's clamp groups accounting

From: Dietmar Eggemann
Date: Thu Aug 16 2018 - 11:01:00 EST


On 08/16/2018 04:21 PM, Quentin Perret wrote:
On Thursday 16 Aug 2018 at 15:45:45 (+0200), Dietmar Eggemann wrote:
On 08/16/2018 03:37 PM, Quentin Perret wrote:
IMHO, if this is something which should not happen at all, a BUG_ON() is the
right thing to do here.

I don't agree on that. I agree it should not happen but since it's a
recoverable error it think we should not panic.

FWIW, if this is a recoverable error, I think Linus will agree with
Patrick on this one :-)

https://lkml.org/lkml/2016/10/4/1

Yeah, not really agreeing here that this is a recoverable error.

A non-recoverable scenario could be, for example, if you corrupt your
stack and there is absolutely _nothing_ you can do to keep the system up
and running, because it's just too broken. I don't feel like we're
talking about such an extreme case here ...

Yeah, that's the extreme. But what about this lovely BUG_ON(busiest == env.dst_rq) in fair.c's load_balance()?

We could recover by just bailing out ;-)

I guess we know by now that there are different opinions here.


Besides, we
only consider under-run here, what about over-run?

Important thing is to also detect the over-run, i.e. add the first task and the task counter is already > 0.


Currently this warning doesn't hit and if the code will be changed and it
hits, I still find a BUG_ON more appealing here ...

So this error scenario can happen over and over again and we always recover
from ? The important thing is that we find the culprit for this behaviour as
fast as possible ...

Agreed, we want to debug that ASAP, but WARN should let us do that just
fine, I think.

+1.