Re: next/master bisection: baseline.login on rk3288-rock2-square

From: Marc Zyngier
Date: Thu Feb 04 2021 - 07:27:30 EST


On 2021-02-04 10:55, Ard Biesheuvel wrote:
(cc Marc)

On Thu, 4 Feb 2021 at 11:48, Russell King - ARM Linux admin
<linux@xxxxxxxxxxxxxxx> wrote:

On Thu, Feb 04, 2021 at 11:27:16AM +0100, Ard Biesheuvel wrote:
> Hi Russell,
>
> If Guillaume is willing to do the experiment, and it fixes the issue,
> it proves that rk3288 is relying on the flush before the MMU is
> disabled, and so in that case, the fix is trivial, and we can just
> apply it.
>
> If the experiment fails (which would mean rk3288 does not tolerate the
> cache maintenance being performed after cache off), it is going to be
> hairy, and so it will definitely take more time.
>
> So in the latter case (or if Guillaume does not get back to us), I
> think reverting my queued fix is the only sane option. But in that
> case, may I suggest that we queue the revert of the original by-VA
> change for v5.12 so it gets lots of coverage in -next, and allows us
> an opportunity to come up with a proper fix in the same timeframe, and
> backport the revert and the subsequent fix as a pair? Otherwise, we'll
> end up in the situation where v5.10.x until today has by-va, v5.10.x-y
> has set/way, and v5.10y+ has by-va again. (I don't think we care about
> anything before that, given that v5.4 predates any of this)

I'm suggesting dropping your fix (9052/1) and reverting
"ARM: decompressor: switch to by-VA cache maintenance for v7 cores"
which gets us to a point where _both_ regressions are fixed.


I understand, but we don't know whether doing so might regress other
platforms that were added in the mean time.

I'm of the opinion that the by-VA patch was incorrect when it was
merged (it caused a regression), and it's only a performance
improvement.

It is a correctness improvement, not a performance improvement.

Without that change, the 32-bit ARM kernel cannot boot bare metal on
platforms with a system cache such as 8040 or SynQuacer, and can only
boot under KVM on such systems because of the special handling of
set/way instructions by the host.

I agree. With set/way CMOs, there is no way to reach the PoC if
it beyond the system cache, leading to an unbootable kernel.
This is actually pretty well documented in the architecture,
and it did bite us for the first time on XGene-1, 7 years ago.

In retrospect, having KVM to handle set/way CMOs in was a mistake,
as it just papered over the problem for the sake of running older
32bit guests. It violated the principle of KVM/arm being strictly
architectural and provided unrealistic expectations. I'll take the
blame for this.

Thanks,

M.
--
Jazz is not dead. It just smells funny...