Re: next/master bisection: baseline.login on rk3288-rock2-square

From: Guillaume Tucker
Date: Thu Feb 04 2021 - 05:36:02 EST


On 04/02/2021 10:27, Ard Biesheuvel wrote:
> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
> <linux@xxxxxxxxxxxxxxx> wrote:
>>
>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>> <guillaume.tucker@xxxxxxxxxxxxx> wrote:
>>>>
>>>> Hi Ard,
>>>>
>>>> Please see the bisection report below about a boot failure on
>>>> rk3288 with next-20210203. It was also bisected on
>>>> imx6q-var-dt6customboard with next-20210202.
>>>>
>>>> Reports aren't automatically sent to the public while we're
>>>> trialing new bisection features on kernelci.org but this one
>>>> looks valid.
>>>>
>>>> The kernel is most likely crashing very early on, so there's
>>>> nothing in the logs. Please let us know if you need some help
>>>> with debugging or trying a fix on these platforms.
>>>>
>>>
>>> Thanks for the report.
>>
>> Ard,
>>
>> I want to send my fixes branch today which includes your regression
>> fix that caused this regression.
>>
>> As this is proving difficult to fix, I can only drop your fix from
>> my fixes branch - and given that this seems to be problematical, I'm
>> tempted to revert the original change at this point which should fix
>> both of these regressions - and then we have another go at getting rid
>> of the set/way instructions during the next cycle.
>>
>> Thoughts?
>>
>
> Hi Russell,
>
> If Guillaume is willing to do the experiment, and it fixes the issue,

Yes, I'm running some tests with that fix now and should have
some results shortly.

> it proves that rk3288 is relying on the flush before the MMU is
> disabled, and so in that case, the fix is trivial, and we can just
> apply it.
>
> If the experiment fails (which would mean rk3288 does not tolerate the
> cache maintenance being performed after cache off), it is going to be
> hairy, and so it will definitely take more time.
>
> So in the latter case (or if Guillaume does not get back to us), I
> think reverting my queued fix is the only sane option. But in that
> case, may I suggest that we queue the revert of the original by-VA
> change for v5.12 so it gets lots of coverage in -next, and allows us
> an opportunity to come up with a proper fix in the same timeframe, and
> backport the revert and the subsequent fix as a pair? Otherwise, we'll
> end up in the situation where v5.10.x until today has by-va, v5.10.x-y
> has set/way, and v5.10y+ has by-va again. (I don't think we care about
> anything before that, given that v5.4 predates any of this)
>
> But in the end, I'm happy to go along with whatever works best for you.

Thanks,
Guillaume