The error context is in the behavior of the hw. If the error is fatal, you
won't see it - the machine will panic or do something else to prevent error
propagation. It definitely won't run any software anymore.
If you see the error getting logged, it means it is not fatal enough to kill
the machine.
One place in the fatal case where I would like to see more information is the
"Action required: data load in error *UN*recoverable area of kernel"
[emphasis on the "UN" added].
case. We have a few places where the kernel does recover. And most places
we crash. Our code for the recoverable cases is fragile.Most of this series is
about repairing regressions where we used to recover from places where kernel
is doing get_user() or copy_from_user() which can be recovered if those places
get an error return and the kernel kills the process instead of crashing.
A long time ago I posted some patches to include a stack trace for this type
of crash. It didn't make it into the kernel, and I got distracted by other things.
If we had that, it would have been easier to diagnose this regression (Shaui
Xie would have seen crashes with a stack trace pointing to code that used
to recover in older kernels). Folks with big clusters would also be able to
point out other places where the kernel crashes often enough that additional
EXTABLE recovery paths would be worth investigating.
So:
1) We need to fix the regressions. That just needs new commit messages
for these patches that explain the issue better.
2) I'd like to see a patch for a stack trace for the unrecoverable case.
3) I don't see much value in a message that reports the recoverable case.