Re: [BUG] arm64/m1: Accessing SYS_ID_AA64ISAR2_EL1 causes early boot failure on 5.15.28, 5.16.14, 5.17

From: A. Wilcox
Date: Mon Mar 14 2022 - 06:03:14 EST


On Mar 14, 2022, at 4:08 AM, Marc Zyngier <maz@xxxxxxxxxxxxxxx> wrote:
> On 2022-03-14 06:35, Greg KH wrote:
>> On Sun, Mar 13, 2022 at 10:59:01PM -0500, A. Wilcox wrote:
>>> Hello,
>>> I’ve been testing kernel updates for the Adélie Linux distribution’s
>>> ARM64 port using a Parallels VM on a MacBook Pro (13-inch, M1, 2020).
>>> When the kernel attempts to access SYS_ID_AA64ISAR2_EL1, it causes a
>>> fault as seen here booting 5.17.0-rc8:
>
> […]
>
>>> This is because detection of the clearbhb instruction support requires
>>> accessing SYS_ID_AA64ISAR2_EL1. Commenting out the two uses of
>>> supports_clearbhb in the kernel now yields a successful boot.
>>> Qemu developers seem to have found this issue as well[1] when trying to
>>> boot 5.17 using HVF, the Apple Hypervisor Framework. This seems to be
>>> some sort of platform quirk on M1, or at least in HVF on M1. I’m not
>>> sure what the best workaround would be for this. SYS_ID_AA64ISAR2_EL1
>>> seems to be something added in ARMv8.7, so perhaps access to it could be
>>> gated on that.
>>> Unfortunately, this code was just added to 5.15.28 and 5.16.14, so
>>> stable no longer boots on Parallels VM on M1. I am unsure if this
>>> affects physical boot on Apple M1 or not.
>> What commit causes this problem? It sounds like you narrowed this down
>> already, right?
>
> This really is a Parallels bug. These kernels run fine on bare metal
> M1 and in KVM. QEMU was affected as well, and that was fixed in their
> HVF handling. HVF itself is fine.
>
> So this should be punted back to the hypervisor vendor for not properly
> implementing the architecture (no ID register is allowed to UNDEF).
>
> M.
> --
> Who you jivin' with that Cosmik Debris?

Thanks, I wasn’t able to test native boot. Since this is a bug in the hypervisor, I’ll notify them in the morning.

For those of us stuck with Parallels, I’ll assume reverting of these three commits in my own build is the best way forward until it’s fixed. The M1 isn’t going to grow new instruction support in the meantime, so I don’t see a whole lot of harm in it - but the other mitigations in .28 seem useful.

Best,
-A.