Re: [RFC][PATCH] irq_work

From: Borislav Petkov
Date: Thu Jun 24 2010 - 11:40:01 EST

From: Andi Kleen <andi@xxxxxxxxxxxxxx>
Date: Thu, Jun 24, 2010 at 10:01:43AM -0400

> > Please, as Peter and Boris asked you already, quote a concrete, specific
> > example:
> It was already in my answer to Peter.
> >
> > 'Specific event X occurs, kernel wants/needs to do Y. This cannot be done
> > via the suggested method due to Z.'
> >
> > Your generic arguments look wrong (to the extent they are specified) and it
> > makes it much easier and faster to address your points if you dont blur them
> > by vagaries.
> It's one of the fundamental properties of recoverable errors.
> Error happens.
> Machine check or NMI or other exception happens.
> That exception runs on the exception stack
> The error is not fatal, but recoverable.
> For example you want to kill a process or call hwpoison or do some other
> recovery action. These generally have to sleep to do anything
> interesting.
> You cannot do the sleeping on the exception stack, so you push it to
> another context.
> Now just because an error is recoverable doesn't mean it's not critical
> (I think that was the mistake Boris made).

It wasn't a mistake - I was simply trying to lure you into giving a more
concrete example so that we all land on the same page and we know what
the heck you/we/all are talking about.

> If you don't do something
> (like killing or recovery) you could end up in a loop or consume
> corrupted data or something else bad.
> So the error has to have a fail safe path from detection to handling.

So we are talking about a more involved and "could-sleep" error

> That's quite different from logging or performance counting etc.
> where dropping events on overload is normal and expected.

So I went back and reread the whole thread, and correct me if I'm
wrong but the whole run softirq after NMI has one use case for now -
"could-sleep" error handling for MCEs _only_ on x86. So you're changing
a bunch of generic and x86 kernel code just for error handling. Hmm,
that's a kinda big hammer in my book.

A slimmer solution is a much better way to go, IMHO. I think Peter said
something about irq_exit(), which should be just fine.

But AFAICT an arch-specific solution would be even better, e.g.
if you call into your deferred work helper from paranoid_exit in
<arch/x86/kernel/entry_64.S>. I.e, something like

#ifdef CONFIG_X86_MCE
testl $_TIF_NEED_POST_NMI,%ebx
jnz do_post_nmi_work

Or even slimmer, rewrite the paranoidzeroentry to a MCE-specific variant
which does the added functionality. But that wouldn't be extensible if
other entities want post-NMI work later.


Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at