Re: Kernel panic on Google Pixel devices due to regulator patch

From: Greg KH
Date: Wed Dec 18 2019 - 07:22:02 EST


On Wed, Dec 18, 2019 at 11:34:58AM +0000, Mark Brown wrote:
> On Tue, Dec 17, 2019 at 11:51:55PM +0800, Siddharth Kapoor wrote:
>
> > I would like to share a concern with the regulator patch which is part of
> > 4.9.196 LTS kernel.
>
> That's an *extremely* old kernel.

It is, but it's the latest stable kernel (well close to), and your patch
was tagged by you to be backported to here, so if there's a problem with
a stable branch, I want to know about it as I don't want to see
regressions happen in it.

> > https://lore.kernel.org/lkml/20190904124250.25844-1-broonie@xxxxxxxxxx/
>
> That's the patch "[PATCH] regulator: Defer init completion for a while
> after late_initcall" which defers disabling of idle regulators for a
> while.
>
> Please include human readable descriptions of things like commits and
> issues being discussed in e-mail in your mails, this makes them much
> easier for humans to read especially when they have no internet access.
> I do frequently catch up on my mail on flights or while otherwise
> travelling so this is even more pressing for me than just being about
> making things a bit easier to read.
>
> > We have reverted the patch in Pixel kernels and would like you to look into
> > this and consider reverting it upstream as well.
>
> I've got nothing to do with the stable kernels so there's nothing I can
> do here, sorry.

Should I revert it everywhere? This patch reads as it should be fixing
problems, not causing them :)

> However if this is triggering anything it's almost
> certainly some kind of timing issue (this code isn't new, it's just
> being run a bit later) and is only currently working through luck so I
> do strongly recommend trying to figure out the actual problem since it's
> liable to come back and bite you later - we did find one buggy driver in
> mainline as a result of this change, it's possible you've got another
> one.
>
> Possibly your GPU supplies need to be flagged as always on, possibly
> your GPU driver is forgetting to enable some supplies it needs, or
> possibly there's a missing always-on constraint on one of the regulators
> depending on how the driver expects this to work (if it's a proprietary
> driver it shouldn't be using the regulator API itself). I'm quite
> surprised you've not seen any issue before given that the supplies would
> still be being disabled earlier.

Timing "luck" is probably something we shouldn't be messing with in
stable kernels. How about I revert this for the 4.14 and older releases
and let new devices deal with the timing issues when they are brought up
on new hardware?

thanks,

greg k-h