Re: [PATCH RFC] arm64: dts: qcom: hamoa: Drop cluster_cl5 idle state from CPU clusters
From: Marc Zyngier
Date: Mon Jun 08 2026 - 08:24:00 EST
On Mon, 08 Jun 2026 12:40:02 +0100,
Konrad Dybcio <konrad.dybcio@xxxxxxxxxxxxxxxx> wrote:
>
> On 6/5/26 10:09 AM, Marc Zyngier wrote:
> > Hi Jens,
> >
> > Thanks for sending this.
>
> [...]
>
> > It may be worth adding a comment somewhere in the DTS file, as
> > cluster_cl5 is not referenced anymore.
> >
> > Ideally we'd simply mark cluster-sleep-1 with 'status = "disabled"',
> > but I'm not sure Linux (and other OSs that consume this) actively
> > parse this property.
> >
> > Overall, I'd like clarity from the vendor on what can be done to
> > better mitigate issues like this. So far, we have been randomly
> > disabling features and CPU capabilities each and every time we find
> > something broken on these machines, and the list is getting long.
> >
> > I don't think such course of action is sustainable, and maybe we
> > should simply consider marking the full X1 platform as BROKEN so that
> > people know what to expect.
>
> Many "Linux-facing" people have been OoO and/or attending various
> conferences and an internal sprint for the past 2-3 weeks in a row,
> so there weren't a lot of eyes on this.. We're looking into it now.
This isn't about "Linux-facing" people availability this month or even
over the past 18 months. This doesn't even have anything to do with
Linux at all. This is about an ongoing stream of issues that have been
reported and constantly ignored since this HW made it in the wild.
For example, two of my personal favourites:
- accesses to CNTPOFF_EL2 reset the machine. This has been documented
for since 358dd4a9bdac6 ("arm64: Add command-line override for
ID_AA64MMFR0_EL1.ECV").
- CNTVOFF_EL2 not being consistently honoured results in screaming
timer interrupts. This has been documented since 0bc9a9e85fcf4
("KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosity").
I don't mind broken HW. It's almost a reassuring invariant. But this
level of brokenness without any form of acknowledgement of the issues,
nor any proposed workarounds is not acceptable. Specially when
people's data is at stake.
Which is why I think CONFIG_BROKEN=y is the right form of safety guard
against all of this until someone gets to the bottom of these things.
M.
--
Without deviation from the norm, progress is not possible.