Re: [PATCH v7 9/9] docs: ABI: testing: Document the Ampere Altra Family's SMpro sysfs interfaces

From: Quan Nguyen
Date: Mon Mar 21 2022 - 05:46:54 EST




On 21/03/2022 15:23, Greg Kroah-Hartman wrote:
On Mon, Mar 21, 2022 at 03:13:55PM +0700, Quan Nguyen wrote:
Add documentation for the Ampere(R)'s Altra(R) SMpro sysfs interfaces

Signed-off-by: Quan Nguyen <quan@xxxxxxxxxxxxxxxxxxxxxx>
---
Changes in v7:
+ First introduce in v7 [Greg]

.../sysfs-bus-platform-devices-ampere-smpro | 133 ++++++++++++++++++
1 file changed, 133 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro

diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
new file mode 100644
index 000000000000..9bfd8d6d0f71
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
@@ -0,0 +1,133 @@
+What: /sys/bus/platform/devices/smpro-errmon.*/errors_[core|mem|pcie|other]_[ce|ue]

Please split this out as one entry per file.


These sysfs share same format of HW errors (the 48-byte Arm vendor specific HW error record) but for separate HW domains: Core, PCIe, Mem... etc

+KernelVersion: 5.14

5.14 is a long time ago.

+Contact: quan@xxxxxxxxxxxxxxxxxxxxxx
+Description:
+ (RO) Contains the 48-byte Ampere (Vendor-Specific) Error Record, see [1]
+ printed in hex format as below:
+
+ AA BB CCCC DDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD \
+ DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD
+ Where:
+ AA : Error Type
+ BB : Subtype
+ CCCC : Instance
+ DDD...DDD: Similar to the Arm RAS standard error record

No, this is not a valid sysfs file, sorry. This should just be one
value per file.


This 48-byte value is unable to separate into smaller values because it contain all information necessary to indicate a single HW error as per ARM RAS supplement document [1]. The format is to make it read-able other than a single 48-byte hex value.

[1] https://developer.arm.com/documentation/ddi0587/latest/


+
+ See [1] below for the format details.
+
+ The detail of each sysfs entries is as below:
+ +-------------+---------------------------------------------------------+
+ | Error | Sysfs entry |
+ +-------------+---------------------------------------------------------+
+ | Core's CE | /sys/bus/platform/devices/smpro-errmon.*/errors_core_ce |
+ | Core's UE | /sys/bus/platform/devices/smpro-errmon.*/errors_core_ue |
+ | Memory's CE | /sys/bus/platform/devices/smpro-errmon.*/errors_mem_ce |
+ | Memory's UE | /sys/bus/platform/devices/smpro-errmon.*/errors_mem_ue |
+ | PCIe's CE | /sys/bus/platform/devices/smpro-errmon.*/errors_pcie_ce |
+ | PCIe's UE | /sys/bus/platform/devices/smpro-errmon.*/errors_pcie_ue |
+ | Other's CE | /sys/bus/platform/devices/smpro-errmon.*/errors_other_ce|
+ | Other's UE | /sys/bus/platform/devices/smpro-errmon.*/errors_other_ue|
+ +-------------+---------------------------------------------------------+
+ UE: Uncorrect-able Error
+ CE: Correct-able Error
+
+ [1] Section 3.3 Ampere (Vendor-Specific) Error Record Formats,
+ Altra Family RAS Supplement.
+
+
+What: /sys/bus/platform/devices/smpro-errmon.*/errors_[smpro|pmpro]
+KernelVersion: 5.14
+Contact: quan@xxxxxxxxxxxxxxxxxxxxxx
+Description:
+ (RO) Contains the internal firmware error record printed as hex format
+ as below:
+
+ A BB C DD EEEE FFFFFFFF

Again this isn't a good sysfs entry. You should never have to parse a
sysfs file except for a single value.

thanks,

greg k-h

This error is also unable to separate further as well.

Thanks Greg for the review.
- Quan