Re: [PATCH 1/6] dt-bindings: interrupt-controller: arm,gic-v3: Add quirk for Mediatek SoCs w/ broken FW

From: Marc Zyngier
Date: Fri May 12 2023 - 04:02:45 EST


On Thu, 11 May 2023 23:05:35 +0100,
Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> When trying to turn on the "pseudo NMI" kernel feature in Linux, it
> was discovered that all Mediatek-based Chromebooks that ever shipped
> (at least ones with GICv3) had a firmware bug where they wouldn't save
> certain GIC "GICR" registers properly. If a processor ever entered a
> suspend/idle mode where the GICR registers lost state then they'd be
> reset to their default state.
>
> As a result of the bug, if you try to enable "pseudo NMIs" on the
> affected devices then certain interrupts will unexpectedly get
> promoted to be "pseudo NMIs" and cause crashes / freezes / general
> mayhem.
>
> ChromeOS is looking to start turning on "pseudo NMIs" in production to
> make crash reports more actionable. To do so, we will release firmware
> updates for at least some of the affected Mediatek Chromebooks.
> However, even when we update the firmware of a Chromebook it's always
> possible that a user will end up booting with old firmware. We need to
> be able to detect when we're running with firmware that will crash and
> burn if pseudo NMIs are enabled.
>
> The current plan is:
> * Update the device trees of all affected Chromebooks to include the
> 'mediatek,gicr-save-quirk' property. The kernel can use this to know
> not to enable certain features like "pseudo NMI". NOTE: device trees
> for Chromebooks are never baked into the firmware but are bundled
> with the kernel. A kernel will never be configured to use "pseudo
> NMIs" and be bundled with an old device tree.
> * When we get a fixed firmware for one of these Chromebooks, it will
> patch the device tree to remove this property.

Since you're in control of distributing the FW together with the
kernel, I assume you're also in control of the command line. Why can't
that firmware pass the option enabling the pseudo-NMI support,
dispensing ourselves from all of this?

>
> For some details, you can also see the public bug
> <https://issuetracker.google.com/281831288>
>
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> ---
>
> .../bindings/interrupt-controller/arm,gic-v3.yaml | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml
> index 92117261e1e1..8c251caae537 100644
> --- a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml
> +++ b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml
> @@ -166,6 +166,12 @@ properties:
> resets:
> maxItems: 1
>
> + mediatek,gicr-save-quirk:

I think this deserves something *much* stronger that outlines what is
wrong, because this is not just a quirk. This is a failure to even
remotely grasp the requirements of the architecture (and to use
standard, public code that would have done it correctly). Something
like "mediatek,broken-save-restore-fw" would be more adequate.

> + type: boolean
> + description:
> + Asserts that the firmware on this device has issues saving and restoring
> + GICR registers when CPUs are powered off.

Nit: not the the CPUs, but the GIC redistributors.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.