Re: kexec on panic
From: Denys Fedoryshchenko
Date: Sat Feb 18 2017 - 03:16:34 EST
On 2017-02-18 09:42, Jon Masters wrote:
Hi Denys,
On 02/10/2017 03:14 AM, Denys Fedoryshchenko wrote:
After years of using kexec and recent unpleasant experience with
modern (supposed to be blazing fast to boot) hardware that need 5-10
minutes just to pass POST tests,
one question came up to me:
Is it possible anyhow to execute regular (not special "panic" one to
capture crash data) kexec on panic to reduce reboot time?
Generally, you don't want to do this, because various platform hardware
might be in non-quiescent states (still doing DMA to random memory,
etc.)
and other nastiness that means you don't want to do more than the
minimal
amount in a kexec on panic (crash). We've seen no end of fun and games
even with just regular crash dumps while hardware is busily writing to
memory that it shouldn't be. An IOMMU helps, but isn't a cure-all.
Jon.
Well, i have to try, even sometimes i am facing issues with non-booting
hardware even on regular kexec, but having at small customer HP server
that need almost 6 minutes to boot,
no hot-spare(and hard to do by many reasons, no spare 10G ports, cost of
hardware and etc) and some nasty bugs that is not resolved yet - forcing
me to search way to reduce reboot time.
If i will find way to save backtrace and reboot fast, it will help a lot
to debug kernels with minimal downtime, if bug is reproducible only on
live system.
What i did now, might be insanely wrong, but:
diff -Naur linux-4.9.9-vanilla/kernel/kexec_core.c
linux-4.9.9/kernel/kexec_core.c
--- linux-4.9.9-vanilla/kernel/kexec_core.c 2017-02-09
07:08:40.000000000 +0000
+++ linux-4.9.9/kernel/kexec_core.c 2017-02-17 12:54:49.000000000 +0000
@@ -897,6 +897,10 @@
machine_crash_shutdown(&fixed_regs);
machine_kexec(kexec_crash_image);
}
+ if (kexec_image) {
+ machine_shutdown();
+ machine_kexec(kexec_image);
+ }
mutex_unlock(&kexec_mutex);
}
}
Then
kexec -l /mnt/flash/kernel --append="intel_idle.max_cstate=0
processor.max_cstate=1"
and
echo c >/proc/sysrq-trigger
worked even on busy network router, but i'm not sure it will be same on
real networking stack crash.