Re: [RFC][PATCH] irq_work

From: Andi Kleen
Date: Thu Jun 24 2010 - 10:01:50 EST

> Please, as Peter and Boris asked you already, quote a concrete, specific
> example:

It was already in my answer to Peter.

> 'Specific event X occurs, kernel wants/needs to do Y. This cannot be done
> via the suggested method due to Z.'
> Your generic arguments look wrong (to the extent they are specified) and it
> makes it much easier and faster to address your points if you dont blur them
> by vagaries.

It's one of the fundamental properties of recoverable errors.

Error happens.
Machine check or NMI or other exception happens.
That exception runs on the exception stack
The error is not fatal, but recoverable.
For example you want to kill a process or call hwpoison or do some other
recovery action. These generally have to sleep to do anything
You cannot do the sleeping on the exception stack, so you push it to
another context.

Now just because an error is recoverable doesn't mean it's not critical
(I think that was the mistake Boris made). If you don't do something
(like killing or recovery) you could end up in a loop or consume
corrupted data or something else bad.

So the error has to have a fail safe path from detection to handling.

That's quite different from logging or performance counting etc.
where dropping events on overload is normal and expected.

Normally it can be only done by using dedicated resources.


