Re: [PATCH 11/13] arm: npcm: drop selecting non-existing ARM_ERRATA_794072

From: Avi Fishman
Date: Tue Nov 02 2021 - 04:22:41 EST


Hi,

At Nuvoton we implied this WA in the past, not because we encountered
a specific problem but since the errata says so and we saw this in
other patches like:
https://patchwork.kernel.org/project/linux-arm-kernel/patch/1396298072-13254-2-git-send-email-nitin.garg@xxxxxxxxxxxxx/
But we didn't upstream the arch/arm/mm/proc-v7.S

>From the errata document.
"794072 A short loop including a DMB instruction might cause a denial
of service on
another processor which executes a CP15 broadcast operation
Status
Affects:
Fault Type:
Fault Status: Cortex-A9 MPCore.
Programmer Category B
Present in: All r1, r2, r3 and r4 revisions Open
Description
A processor which continuously executes a short loop containing a DMB
instruction might prevent a CP15
operation broadcast by another processor making further progress,
causing a denial of service.
Configurations affected
This erratum affects all Cortex-A9 MPCore processors with two or more
processors.
Conditions
The erratum requires the following conditions:
- Two or more processors are working in SMP mode (ACTLR.SMP=1)
- One of the processors continuously executes a short loop containing
at least one DMB instruction.
- Another processor executes a CP15 maintenance operation that is
broadcast. This requires that this
processor has enabled the broadcasting of CP15 operations (ACTLR.FW=1)
For the erratum to occur, the short loop containing the DMB
instruction must meet all of the following additional
conditions:
- No more than 10 instructions other than the DMB are executed between each DMB
- No non-conditional Load or Store, or conditional Load or Store which
pass the condition code check, are
executed between each DMB
When all the conditions for the erratum are met, the short loop
creates a continuous stream of DMB instructions.
This might cause a denial of service, by preventing the processor
executing the short loop from executing the
received broadcast CP15 operation. As a result, the processor that
originally executed the broadcast CP15
operation is stalled until the execution of the loop is interrupted.
Note that because the process issuing the CP15 broadcast operation
cannot complete operation, it cannot enter
any debug-mode, and cannot take any interrupt. If the processor
executing the short loop also cannot be
interrupted, for example if it has disabled its interrupts, or if no
interrupts are routed to this processor, this
erratum might cause a system livelock.
Implications
The erratum might create performance issues, or in the worst case it
might cause a system livelock if the
processor executing the DMB is in an infinite loop that cannot be interrupted.
Workaround
This erratum can be worked round by setting bit[4] of the undocumented
Diagnostic Control Register to 1. This
register is encoded as CP15 c15 0 c0 1.
This bit can be written in Secure state only, with the following
Read/Modify/Write code sequence:
MRC p15,0,rt,c15,c0,1
ORR rt,rt,#0x10
MCR p15,0,rt,c15,c0,1
When it is set, this bit causes the DMB instruction to be decoded and
executed like a DSB.
Using this software workaround is not expected to have any impact on
the overall performance of the processor
on a typical code base.
Other workarounds are also available for this erratum, to either
prevent or interrupt the continuous stream of
DMB instructions that causes the deadlock. For example:
- Inserting a non-conditional Load or Store instruction in the loop
between each DMB
- Inserting additional instructions in the loop, such as NOPs, to
avoid the processor seeing back to back
DMB instructions.
- Making the processor executing the short loop take regular interrupts."

Avi

On Tue, Nov 2, 2021 at 9:31 AM Lukas Bulwahn <lukas.bulwahn@xxxxxxxxx> wrote:
>
> On Fri, Oct 29, 2021 at 8:36 AM Joel Stanley <joel@xxxxxxxxx> wrote:
> >
> > On Thu, 28 Oct 2021 at 14:57, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > >
> > > On Thu, Oct 28, 2021 at 4:19 PM Lukas Bulwahn <lukas.bulwahn@xxxxxxxxx> wrote:
> > > >
> > > > There is no and never was a Kconfig for ARM_ERRATA_794072 in the kernel
> > > > tree. So, there is no need to select ARM_ERRATA_794072 in
> > > > ./arch/arm/mach-npcm/Kconfig.
> > > >
> > > > Simply drop selecting the non-existing ARM_ERRATA_794072.
> > > >
> > > > This issue was discovered with ./scripts/checkkconfigsymbols.py.
> > > >
> > > > Signed-off-by: Lukas Bulwahn <lukas.bulwahn@xxxxxxxxx>
> > > > ---
> > >
> > > Could this be a typo? Maybe we need to enable a different errata workaround
> > > here, or maybe that code is actually needed and has to get sent.
> >
> > Doing some searching, u-boot had a workaround for something called
> > ARM_ERRATA_794072.
> >
> > https://github.com/u-boot/u-boot/commit/f71cbfe3ca5d2ad20159871700e8e248c8818ba8
> >
> > Lore has the review history for that patch:
> >
> > https://lore.kernel.org/all/6be32e0b5b454ed7b609317266a8e798@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> >
> > It looks like it's the same workaround as ARM_ERRATA_742230, which the
> > kernel does implement.
> >
> > It would be good to hear from the Nuvoton people, or an Arm person.
> >
>
> I will happily update the patch to select ARM_ERRATA_742230 instead of
> the dead non-existing ARM_ERRATA_794072.
>
> In contrast to the current patch that basically only cleans up "dead
> config" and has no effective functional change, the new patch would
> change the behaviour. I cannot test this patch (beyond some basic
> compile test) on the hardware; so, we certainly need someone to have
> that hardware, knows how to test it or confirm otherwise that we
> should select the ARM_ERRATA_742230 fix for this hardware.
>
> The current patch should be subsumed by the new patch; the submission
> of the new patch is deferred until that person shows up. Let's see.
>
> Lukas



--
Regards,
Avi