Re: [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures

From: Suzuki K Poulose
Date: Wed Sep 22 2021 - 08:03:47 EST


Hi Anshuman

On 22/09/2021 08:39, Anshuman Khandual wrote:


On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
from errata, where a TSB (trace synchronization barrier)
fails to flush the trace data completely, when executed from
a trace prohibited region. In Linux we always execute it
after we have moved the PE to trace prohibited region. So,
we can apply the workaround everytime a TSB is executed.

s/everytime/every time

Ack



The work around is to issue two TSB consecutively.

NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
that a late CPU could be blocked from booting if it is the
first CPU that requires the workaround. This is because we
do not allow setting a cpu_hwcaps after the SMP boot. The
other alternative is to use "this_cpu_has_cap()" instead
of the faster system wide check, which may be a bit of an
overhead, given we may have to do this in nvhe KVM host
before a guest entry.

Cc: Will Deacon <will@xxxxxxxxxx>
Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: Mathieu Poirier <mathieu.poirier@xxxxxxxxxx>
Cc: Mike Leach <mike.leach@xxxxxxxxxx>
Cc: Mark Rutland <mark.rutland@xxxxxxx>
Cc: Anshuman Khandual <anshuman.khandual@xxxxxxx>
Cc: Marc Zyngier <maz@xxxxxxxxxx>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@xxxxxxx>
---

...

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index eac4030322df..0764774e12bb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -705,6 +705,37 @@ config ARM64_ERRATUM_2139208
If unsure, say Y.
+config ARM64_WORKAROUND_TSB_FLUSH_FAILURE
+ bool
+
+config ARM64_ERRATUM_2054223
+ bool "Cortex-A710: 2054223: workaround TSB instruction failing to flush trace"
+ default y
+ help
+ Enable workaround for ARM Cortex-A710 erratum 2054223
+
+ Affected cores may fail to flush the trace data on a TSB instruction, when
+ the PE is in trace prohibited state. This will cause losing a few bytes
+ of the trace cached.
+
+ Workaround is to issue two TSB consecutively on affected cores.
+
+ If unsure, say Y.
+
+config ARM64_ERRATUM_2067961
+ bool "Neoverse-N2: 2067961: workaround TSB instruction failing to flush trace"
+ default y
+ help
+ Enable workaround for ARM Neoverse-N2 erratum 2067961
+
+ Affected cores may fail to flush the trace data on a TSB instruction, when
+ the PE is in trace prohibited state. This will cause losing a few bytes
+ of the trace cached.
+
+ Workaround is to issue two TSB consecutively on affected cores.

Like I had mentioned in the previous patch, these descriptions here could
be just factored out inside ARM64_WORKAROUND_TSB_FLUSH_FAILURE instead.

Please see my response there.


+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 451e11e5fd23..1c5a00598458 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -23,7 +23,7 @@
#define dsb(opt) asm volatile("dsb " #opt : : : "memory")
#define psb_csync() asm volatile("hint #17" : : : "memory")
-#define tsb_csync() asm volatile("hint #18" : : : "memory")
+#define __tsb_csync() asm volatile("hint #18" : : : "memory")
#define csdb() asm volatile("hint #20" : : : "memory")
#ifdef CONFIG_ARM64_PSEUDO_NMI
@@ -46,6 +46,20 @@
#define dma_rmb() dmb(oshld)
#define dma_wmb() dmb(oshst)
+
+#define tsb_csync() \
+ do { \
+ /* \
+ * CPUs affected by Arm Erratum 2054223 or 2067961 needs \
+ * another TSB to ensure the trace is flushed. The barriers \
+ * don't have to be strictly back to back, as long as the \
+ * CPU is in trace prohibited state. \
+ */ \
+ if (cpus_have_final_cap(ARM64_WORKAROUND_TSB_FLUSH_FAILURE)) \
+ __tsb_csync(); \
+ __tsb_csync(); \
+ } while (0)
+
/*
* Generate a mask for array_index__nospec() that is ~0UL when 0 <= idx < sz
* and 0 otherwise.
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index ccd757373f36..bdbeac75ead6 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -352,6 +352,18 @@ static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
};
#endif /* CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE */
+#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE
+static const struct midr_range tsb_flush_fail_cpus[] = {
+#ifdef CONFIG_ARM64_ERRATUM_2067961
+ MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+#endif
+#ifdef CONFIG_ARM64_ERRATUM_2054223
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
+#endif
+ {},
+};
+#endif /* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */
+
const struct arm64_cpu_capabilities arm64_errata[] = {
#ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
{
@@ -558,6 +570,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
CAP_MIDR_RANGE_LIST(trbe_overwrite_fill_mode_cpus),
},
+#endif
+#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILRE
+ {
+ .desc = "ARM erratum 2067961 or 2054223",
+ .capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
+ ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
+ },
#endif
{
}
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 1ccb92165bd8..2102e15af43d 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -54,6 +54,7 @@ WORKAROUND_1463225
WORKAROUND_1508412
WORKAROUND_1542419
WORKAROUND_TRBE_OVERWRITE_FILL_MODE
+WORKAROUND_TSB_FLUSH_FAILURE
WORKAROUND_CAVIUM_23154
WORKAROUND_CAVIUM_27456
WORKAROUND_CAVIUM_30115


This adds all the required bits of these erratas in a single patch,
where as the previous work around had split all the required pieces
into multiple patches. Could we instead follow the same standard in
both the places ?

We could do this for this particular erratum as the work around is
within the arm64 kernel code, unlike the other ones - where the TRBE
driver needs a change.

So, there is a kind of dependency for the other two, which we don't
in this particular case.

i.e, TRBE driver needs a cpucap number to implement the work around ->
The arm64 kernel must define one, which we cant advertise yet until
we have a TRBE work around.

Thus, they follow a 3 step model.

- Define CPUCAP erratum
- TRBE driver work around
- Finally advertise to the user.

I don't think this one needs that.

Suzuki