Re: Regression: Linux v5.15+ does not boot on Freescale P2020
From: Pali Rohár
Date: Tue Jul 26 2022 - 10:01:09 EST
On Tuesday 26 July 2022 08:44:05 Segher Boessenkool wrote:
> On Tue, Jul 26, 2022 at 11:02:59AM +0200, Arnd Bergmann wrote:
> > On Tue, Jul 26, 2022 at 10:34 AM Pali Rohár <pali@xxxxxxxxxx> wrote:
> > > On Monday 25 July 2022 16:54:16 Segher Boessenkool wrote:
> > > > The EH field in larx insns is new since ISA 2.05, and some ISA 1.x cpu
> > > > implementations actually raise an illegal insn exception on EH=1. It
> > > > appears P2020 is one of those.
> > >
> > > P2020 has e500 cores. e500 cores uses ISA 2.03. So this may be reason.
> > > But in official Freescale/NXP documentation for e500 is documented that
> > > lwarx supports also eh=1. Maybe it is not really supported.
> > > https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf (page 562)
>
> (page 6-186)
>
> > > At least there is NOTE:
> > > Some older processors may treat EH=1 as an illegal instruction.
>
> And the architecture says
> Programming Note
> Warning: On some processors that comply with versions of the
> architecture that precede Version 2.00
But e500v2 is 2.03 and not older than 2.00...
> executing a Load And Reserve
> instruction in which EH = 1 will cause the illegal instruction error
> handler to be invoked.
>
> > In commit d6ccb1f55ddf ("powerpc/85xx: Make sure lwarx hint isn't set on ppc32")
> > this was clarified to affect (all?) e500v1/v2,
>
> e500v1/v2 based chips will treat any reserved field being set in an
> opcode as illegal.
>
> while the architecture says
>
> Reserved fields in instructions are ignored by the processor.
>
> Whoops :-) We need fixes for processor implementation bugs all the
> time of course, but this is a massive *design* bug.
I looked also in e500v2 and P2020 errata documents there is nothing
mentioned about eh flag. But it looks like a bug.
> I'm surprised this
> CPU still works as well as it does!
>
> Even the venerable PEM (last updated in 1997) shows the EH field as
> reserved, always treated as 0.
>
> > this one apparently
> > fixed it before,
> > but Christophe's commit effectively reverted that change.
> >
> > I think only the simple_spinlock.h file actually uses EH=1
>
> That's right afaics.
>
> > and this is not
> > included in non-SMP kernels, so presumably the only affected machines were
> > the rare dual-core e500v2 ones (p2020, MPC8572, bsc9132), which would
> > explain why nobody noticed for the past 9 months.
>
> Also people using an SMP kernel on older cores should see the problem,
> no?
Probably yes.
But most people on these machines are using stable LTS kernels and do
not upgrade too often.
So you need to wait longer time to see people starting reporting such
bugs. Need to wait at least when v4.14 and v4.19 LTS versions stops
receiving updates. v4.19 is used in Debian 10 (oldstable) and v5.4 is
used by current OpenWRT. Both distributions are still supported, so
users have not migrated to new v5.15 problematic kernel yet.
> Or is that patched out? Or does this use case never happen :-)
>
>
> Segher