Re: [PATCH 5/9] HWPoison: add memory_failure_queue()

From: Huang Ying
Date: Mon May 23 2011 - 22:10:39 EST

On 05/23/2011 07:01 PM, Ingo Molnar wrote:
>> If my understanding as above is correct, I think this is a general and
>> complex solution. It is a little hard for user to understand which 'active
>> filters' are in effect. He may need some runtime assistant to understand the
>> code (maybe /sys/events/active_filters, which list all filters in effect
>> now), because that is hard only by reading the source code. Anyway, this is
>> a design style choice.
> I don't think it's complex: the built-in rules are in plain sight (can be in
> the source code or can even be explicitly registered callbacks), the
> configuration/tooling installed rules will be as complex as the admin or tool
> wants them to be.
>> There are still some issues, I don't know how to solve in above framework.
>> - If there are two processes request the same type of hardware error
>> events. One hardware error event will be copied to two ring buffers (each
>> for one process), but the 'active filters' should be run only once for each
>> hardware error event.
> With persistent events 'active filters' should only be attached to the central
> persistent event.

OK. I see.

>> - How to deal with ring-buffer overflow? For example, there is full of
>> corrected memory error in ring-buffer, and now a recoverable memory error
>> occurs but it can not be put into perf ring buffer because of ring-buffer
>> overflow, how to deal with the recoverable memory error?
> The solution is to make it large enough. With *every* queueing solution there
> will be some sort of queue size limit.

Another solution could be:

Create two ring-buffer. One is for logging and will be read by RAS
daemon; the other is for recovering, the event record will be removed
from the ring-buffer after all 'active filters' have been run on it.
Even RAS daemon being restarted or hang, recoverable error can be taken
cared of.

Best Regards,
Huang Ying
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at