Re: [RFC 0/5] riscv: initial support for Generic Hardware Error Source (GHES)
From: Himanshu Chauhan
Date: Fri Feb 07 2025 - 09:18:32 EST
Sorry for the previous email composed in HTML.
On Thu, Feb 6, 2025 at 6:52 PM Rui Qi <qirui.001@xxxxxxxxxxxxx> wrote:
>
> From: Rui Qi <qirui.001@xxxxxxxxxxxxx>
>
> NOTE: Before compiling the kernel, enable ACPI, APEI and GHES in menuconfig
> The following options must be enabled in .config file.
> CONFIG_HAVE_ACPI_APEI=y
> CONFIG_ACPI_APEI=y
> CONFIG_ACPI_APEI_GHES=y
Hi Rui Qi,
The RERI/RAS is already under active development. I have presented
RISC-V RAS software architecture in various RISC-V and RISE meetings.
You can listen to the RISE Recording (link below) of the software
architecture (presented on June 26, 2023).
https://zoom.us/rec/share/HVCY1QmKVhVa32qQiSIOc2OM3vgL9Vg5wlOFKZHGOuci8zBHVkigGNJuaiWEBlsL.vLYGlGyhQq26TCfU
I also posted an initial PoC for RERI emulation in Qemu and RAS
handling in Linux kernel in 2023.
The details on initial PoC for RERI/RAS can be found in the links below:
Initial RERI/RAS PoC:
https://lists.riscv.org/g/tech-ras-eri/topic/reri_ras_poc_for_risc_v/102224101
RERI/RAS PoC update:
https://lists.riscv.org/g/tech-prs/topic/update_on_reri_ras_poc_for/106737135
It is a comprehensive design which includes the proposition of highest
priority Supervisor Software Events(SSE).
The chapter 17 of the SBI specification is about the design of SSE.
Below is the link for the same.
OpenSBI: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/v3.0-rc3/riscv-sbi.pdf
My patch set is waiting for release of SBI specification v3.0 which is
in rc3 right now and pending ECR changes
for processor type and ISA strings among other RISC-V specific changes in ACPI.
I will send my patches as the SBI specification v3.0 is released.
Meanwhile, I will send the RFC
patches for everyone.
Meanwhile please go through the proposed design.
Thanks
Regards
Himanshu
>
> Through fault injection, we can see the following example output
>
> [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
> [Hardware Error]: event severity info
> [Hardware Error]: Error 0, type: corrected
> [Hardware Error]: section_type: memory error
> [Hardware Error]: error_status: Storage error in DRAM memory (0x018d8480019304f0)
> [Hardware Error]: node:0 card:0 module:0 rank:0 bank:0 device:0 row:0 column:0
> [Hardware Error]: error_type: 2, single-bit ECC
> [Hardware Error]: Error 1, type: corrected
> [Hardware Error]: section_type: Flle error
> [Hardware Error]: port_type: 4, root port
> [Hardware Error]: version: 3.0
> [Hardware Error]: command: 0x0146, status: 0x0011
> [Hardware Error]: device_id: 0000:00:00,0
> [Hardware Error]: slot: 0
> [Hardware Error]: secondary_bus: 0x01
> [Hardware Error]: vendor_id: 0x1e93, device_id: 0x1010
> [Hardware Error]: class_code: 060400
> [Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0003
> [Hardware Error]: aer_cor_status: 0x00001000, aer_cor_mask: 0x0000000
> [Hardware Error]: aer_uncor_status: 0x00000000, aer_uncor_mask: 0x04000000
> [Hardware Error]: aer_uncor_severity: 0x004e3030
> [Hardware Error]: TLP Header: 000000000 000000000 0000000000000000
>
>
> Rui Qi (5):
> riscv: select HAVE_ACPI_APEI
> efi: add riscv APEI generic processor error printing support
> riscv: add fix map index for GHES IRQ
> RISC-V: ACPI: define arch_apei_get_mem_attribute
> RISC-V: define ioremap_cache
>
> arch/riscv/Kconfig | 1 +
> arch/riscv/include/asm/acpi.h | 18 ++++++++++++++++++
> arch/riscv/include/asm/fixmap.h | 3 +++
> arch/riscv/include/asm/io.h | 5 +++++
> drivers/firmware/efi/cper.c | 4 ++++
> 5 files changed, 31 insertions(+)
>
> --
> 2.20.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-riscv