Re: [PATCH v2] PCI/AER: Consolidate CXL, ACPI GHES and native AER reporting paths

From: Karolina Stolarek
Date: Thu Apr 24 2025 - 05:03:11 EST


On 23/04/2025 22:31, Bjorn Helgaas wrote:
On Wed, Apr 23, 2025 at 03:52:27PM +0200, Karolina Stolarek wrote:

I wasn't able to produce logs for the CXL path (that is, Restricted CXL
Device, as CXL1.1 devices not supported by the driver due to a missing
functionality; confirmed by Terry) and faced issues when trying to inject
errors via GHES. Is the lack of logs a blocker for this patch? I tested
other CXL scenarios and my changes didn't cause regression, as far as I
know.

Yes, I do think we need to say something about the output format
changes.

I understand.

I assume you're trying GHES on machines that actually do
firmware-first error handling, right? I found several GHES logs by
searching the web for "APEI Generic Hardware Error Source" "PCIe
error". The majority were Dell boxes.

The only way to inject GHES errors I'm aware of is Mauro's patch for qemu[1], so I went down the virtualization path. As for working with the actual hardware, I'd need to ask around and learn more about the platform.

If you can't produce actual logs for comparison, I think we can take
info from a sample log somebody has posted and synthesize what the
changes would be after this patch.

I also found some logs at some point, mostly from 2021 and 2023, but I felt bad about mocking up the messages and tried to produce actual logs. If I can't find a way to get this working in two weeks, I'll revisit this idea.

All the best,
Karolina

-------------------------------------------------------------
[1] - https://lore.kernel.org/lkml/76824dfc6bb5dd23a9f04607a907ac4ccf7cb147.1740653898.git.mchehab+huawei@xxxxxxxxxx/