Re: next/master bisection: baseline.login on rk3288-rock2-square

From: Ard Biesheuvel
Date: Thu Feb 04 2021 - 05:30:47 EST


On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
<linux@xxxxxxxxxxxxxxx> wrote:
>
> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
> > On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
> > <guillaume.tucker@xxxxxxxxxxxxx> wrote:
> > >
> > > Hi Ard,
> > >
> > > Please see the bisection report below about a boot failure on
> > > rk3288 with next-20210203. It was also bisected on
> > > imx6q-var-dt6customboard with next-20210202.
> > >
> > > Reports aren't automatically sent to the public while we're
> > > trialing new bisection features on kernelci.org but this one
> > > looks valid.
> > >
> > > The kernel is most likely crashing very early on, so there's
> > > nothing in the logs. Please let us know if you need some help
> > > with debugging or trying a fix on these platforms.
> > >
> >
> > Thanks for the report.
>
> Ard,
>
> I want to send my fixes branch today which includes your regression
> fix that caused this regression.
>
> As this is proving difficult to fix, I can only drop your fix from
> my fixes branch - and given that this seems to be problematical, I'm
> tempted to revert the original change at this point which should fix
> both of these regressions - and then we have another go at getting rid
> of the set/way instructions during the next cycle.
>
> Thoughts?
>

Hi Russell,

If Guillaume is willing to do the experiment, and it fixes the issue,
it proves that rk3288 is relying on the flush before the MMU is
disabled, and so in that case, the fix is trivial, and we can just
apply it.

If the experiment fails (which would mean rk3288 does not tolerate the
cache maintenance being performed after cache off), it is going to be
hairy, and so it will definitely take more time.

So in the latter case (or if Guillaume does not get back to us), I
think reverting my queued fix is the only sane option. But in that
case, may I suggest that we queue the revert of the original by-VA
change for v5.12 so it gets lots of coverage in -next, and allows us
an opportunity to come up with a proper fix in the same timeframe, and
backport the revert and the subsequent fix as a pair? Otherwise, we'll
end up in the situation where v5.10.x until today has by-va, v5.10.x-y
has set/way, and v5.10y+ has by-va again. (I don't think we care about
anything before that, given that v5.4 predates any of this)

But in the end, I'm happy to go along with whatever works best for you.