Re: [PATCH] EDAC/device: Add sysfs notification for UE,CE count change

From: Trilok Soni
Date: Wed Sep 13 2023 - 13:22:50 EST


On 8/1/2023 3:37 PM, Deepti Jaggi wrote:
> On 7/31/2023 10:48 PM, Trilok Soni wrote:
>> On 7/31/2023 3:40 PM, Trilok Soni wrote:
>>> On 7/31/2023 3:00 PM, Deepti Jaggi wrote:
>>>> A daemon running in user space collects information on correctable
>>>> and uncorrectable errors from EDAC driver by reading corresponding
>>>> sysfs entries and takes appropriate action.
>>>
>>> Which daemon we are referring here? Can you please provide the link to the project?
>>>
>>> Are you using this daemon?
>>>
>>> https://mcelog.org/ - It is for x86, but is your daemon project different?
>>>
>
> No this daemon is not used. Daemon is under development and it is more specific to Qualcomm use cases.
> Based on my limited understanding of mcelog, this daemon is handling errors in an architecture specific way.
> By adding support for sysfs notification in EDAC framework, drivers which are not using any custom sysfs attributes can take advantage of this modification to notify the user space daemon polling on ue_count and/or ce_count attributes.


Did you look at the rasdaemon then?

https://github.com/mchehab/rasdaemon - rasdaemon is also used on more than one architecture including ARM.


>
>>>> This patch adds support for user space daemon to wait on poll() until
>>>> the sysfs entries for UE count and CE count change and then read updated
>>>> counts instead of continuously monitoring the sysfs entries for
>>>> any changes.
>>>
>>> The modifications below are architecture agnostic so I really want to know what exactly we are fixing and if there is a problem.
>>
>
> In the change set, adding support for user space to poll on the ue_count and/or ce_count sysfs attributes.
> On changes in ue_count,ce_count attributes, unblock user space poll from EDAC driver framework and user space can read the changed ce_count, ue_count.
>
> As an example from user space perform the following steps:
>     1. Open the sysfs attribute file for UE count and CE count
>     2. Read the initial CE count and UE count
>     3. Poll on any changes on CE count, UE count fds.
>     4. Once poll unblocks, Read the updated count.
>         5.Take appropriate action on the changed counts.
>
> #####################################################################
> Example Simple User space code Snippet:

All of this resolved in the EDAC framework by tracing per my understanding. If any changes required
we should extend the rasdaemon and show the usecase to explain the it better?

This is very old link but if you follow this patch series you will understand the tracing events in the EDAC
and latest EDAC framework code will help.

https://lkml.indiana.edu/hypermail/linux/kernel/1205.1/01751.html

--
---Trilok Soni