Re: [PATCH 1/2] dt-bindings: edac: Add DT bindings for Kryo EDAC

From: Sai Prakash Ranjan
Date: Fri Jan 24 2020 - 09:25:09 EST


Hi James,

On 2020-01-16 00:18, James Morse wrote:
Hi Sai,

(CC: +Tyler)

On 05/12/2019 09:53, Sai Prakash Ranjan wrote:
This adds DT bindings for Kryo EDAC implemented with RAS
extensions on KRYO{3,4}XX CPU cores for reporting of cache
errors.

KRYO{3,4}XX isn't the only SoC with the RAS extensions. The DT needs
to convey the range
of ways this armv8 RAS extensions stuff can be wired up.


Right, but I was going for Kryo specific implementation and hence the binding as such.

The folk who look after the ACPI specs have made a start:
https://static.docs.arm.com/den0085/a/DEN0085_RAS_ACPI_1.0_BETA_1.pdf

(I suspect that isn't the latest version, I'll try and find out)


That would be helpful, thanks.

I'd like the ACPI table and DT to convey the same information so that
we don't need to
convert or infer things in the driver. If something is missing, we
should get it added!


Sure, I think it is decided now that kernel first RAS implementation will be generic.


diff --git a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
new file mode 100644
index 000000000000..1a39429a73b4
--- /dev/null
+++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
@@ -0,0 +1,67 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Kryo Error Detection and Correction(EDAC)
+
+maintainers:
+ - Sai Prakash Ranjan <saiprakash.ranjan@xxxxxxxxxxxxxx>
+
+description: |
+ Kryo EDAC is defined to describe on-chip error detection and correction
+ for the Kryo CPU cores which implement RAS extensions.

Please don't make this Kryo specific, otherwise this binding becomes
an extra thing we
need to support with a 'v8.2 RAS' driver.

What I'd like is a single 'armv82_ras' edac driver that handles faults
and errors reported
by interrupts, and interacts with the arch code's handling of
'external aborts'. This
should work for all platforms using v8.2 RAS and later.



Ok sure.

+ It will report
+ all Single Bit Errors and Double Bit Errors found in L1/L2 caches in
+ in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache errors
+ are reported in ERR1STATUS and ERR1MISC0 registers.
+ ERXSTATUS_EL1 - Selected Error Record Primary Status Register, EL1
+ ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, EL1
+ ERR1STATUS - Error Record Primary Status Register
+ ERR1MISC0 - Error Record Miscellaneous Register 0
+ Current implementation of Kryo ECC(Error Correcting Code) mechanism is
+ based on interrupts.

Your SoC picked the system registers as the interface to these
component's registers.
The binding would need to specify which index the 'l1-l2' records
start at, and how many
there are. The same for the 'l3-scu'. You can't hard code these, they
are different on
other platforms.


Ok will keep this in mind for the next version.

There is also an MMIO interface which needs a base address, along with
the index and
ranges. (which may be different). The same component may use both the
system register and
the MMIO interface.


I have some doubts here, Where do I get this info? Will this be implementation specific?

This stuff is likely to vary on big/little systems, so you need a way
of describing which
CPUs the settings refer to. This probably isn't something the ACPI
tables capture as ACPI
machines are typically homogenous.


Our SoCs are based on big.LITTLE arch, so this will be needed.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation