Re: [PATCH v4 3/7] regulator: IRQ based event/error notification helpers

From: Vaittinen, Matti
Date: Fri Apr 09 2021 - 03:08:15 EST



On Thu, 2021-04-08 at 20:20 -0700, Kees Cook wrote:
> On Wed, Apr 07, 2021 at 03:50:15PM +0300, Andy Shevchenko wrote:
> > On Wed, Apr 7, 2021 at 12:49 PM Vaittinen, Matti
> > <Matti.Vaittinen@xxxxxxxxxxxxxxxxx> wrote:
> > > On Wed, 2021-04-07 at 12:10 +0300, Andy Shevchenko wrote:
> > > > On Wed, Apr 7, 2021 at 8:02 AM Matti Vaittinen
> > > > <matti.vaittinen@xxxxxxxxxxxxxxxxx> wrote:
> > > > > On Wed, 2021-04-07 at 01:44 +0300, Andy Shevchenko wrote:
> > > > > > On Tuesday, April 6, 2021, Matti Vaittinen <
> > > > > > matti.vaittinen@xxxxxxxxxxxxxxxxx> wrote:
> > > > > > > + BUG();
> > > > > > > +}
>
> This, though, are you sure you want to use BUG()? Linus gets upset
> about
> such things:
> https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on
>

I see. I am unsure of what would be the best action in the regulator
case we are handling here. To give the context, we assume here a
situation where power has gone out of regulation and the hardware is
probably failing. First countermeasure to protect what is left of HW is
to shut-down the failing regulator. BUG() was called here as a last
resort if shutting the power via regulator interface was not
implemented or working.

Eg, we try to take what ever last measure we can to minimize the HW
damage - and BUG() was used for this in the qcom driver where I stole
the idea. Judging the comment related to BUG() in asm-generic/bug.h

/*
* Don't use BUG() or BUG_ON() unless there's really no way out; one

* example might be detecting data structure corruption in the middle
*
of an operation that can't be backed out of. If the (sub)system
* can
somehow continue operating, perhaps with reduced functionality,
* it's
probably not BUG-worthy.
*
* If you're tempted to BUG(), think
again: is completely giving up
* really the *only* solution? There
are usually better options, where
* users don't need to reboot ASAP and
can mostly shut down cleanly.
*/
https://elixir.bootlin.com/linux/v5.12-rc6/source/include/asm-generic/bug.h#L55

this really might be valid use-case.

To me the real question is what happens after the BUG() - and if there
is any generic handling or if it is platform/board specific? Does it
actually have any chance to save the HW?

Mark already pointed that we might need to figure a way to punt a
"failing event" to the user-space to initiate better "safety shutdown".
Such event does not currently exist so I think the main use-case here
is to do logging and potentially prevent enabling any further actions
in the failing HW.

So - any better suggestions?

Best Regards
Matti Vaittinen