Re: [PATCH 1/2] dt-bindings: edac: Add DT bindings for Kryo EDAC

From: James Morse
Date: Wed Jan 15 2020 - 13:48:09 EST


Hi Sai,

(CC: +Tyler)

On 05/12/2019 09:53, Sai Prakash Ranjan wrote:
> This adds DT bindings for Kryo EDAC implemented with RAS
> extensions on KRYO{3,4}XX CPU cores for reporting of cache
> errors.

KRYO{3,4}XX isn't the only SoC with the RAS extensions. The DT needs to convey the range
of ways this armv8 RAS extensions stuff can be wired up.

The folk who look after the ACPI specs have made a start:
https://static.docs.arm.com/den0085/a/DEN0085_RAS_ACPI_1.0_BETA_1.pdf

(I suspect that isn't the latest version, I'll try and find out)

I'd like the ACPI table and DT to convey the same information so that we don't need to
convert or infer things in the driver. If something is missing, we should get it added!


> diff --git a/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
> new file mode 100644
> index 000000000000..1a39429a73b4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/edac/qcom-kryo-edac.yaml
> @@ -0,0 +1,67 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/edac/qcom-kryo-edac.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Kryo Error Detection and Correction(EDAC)
> +
> +maintainers:
> + - Sai Prakash Ranjan <saiprakash.ranjan@xxxxxxxxxxxxxx>
> +
> +description: |
> + Kryo EDAC is defined to describe on-chip error detection and correction
> + for the Kryo CPU cores which implement RAS extensions.

Please don't make this Kryo specific, otherwise this binding becomes an extra thing we
need to support with a 'v8.2 RAS' driver.

What I'd like is a single 'armv82_ras' edac driver that handles faults and errors reported
by interrupts, and interacts with the arch code's handling of 'external aborts'. This
should work for all platforms using v8.2 RAS and later.


> + It will report
> + all Single Bit Errors and Double Bit Errors found in L1/L2 caches in
> + in two registers ERXSTATUS_EL1 and ERXMISC0_EL1. L3-SCU cache errors
> + are reported in ERR1STATUS and ERR1MISC0 registers.
> + ERXSTATUS_EL1 - Selected Error Record Primary Status Register, EL1
> + ERXMISC0_EL1 - Selected Error Record Miscellaneous Register 0, EL1
> + ERR1STATUS - Error Record Primary Status Register
> + ERR1MISC0 - Error Record Miscellaneous Register 0
> + Current implementation of Kryo ECC(Error Correcting Code) mechanism is
> + based on interrupts.

Your SoC picked the system registers as the interface to these component's registers.
The binding would need to specify which index the 'l1-l2' records start at, and how many
there are. The same for the 'l3-scu'. You can't hard code these, they are different on
other platforms.

There is also an MMIO interface which needs a base address, along with the index and
ranges. (which may be different). The same component may use both the system register and
the MMIO interface.

This stuff is likely to vary on big/little systems, so you need a way of describing which
CPUs the settings refer to. This probably isn't something the ACPI tables capture as ACPI
machines are typically homogenous.


Thanks,

James