print kernel version on the console during panic/oops

From: Ani Sinha
Date: Mon Oct 07 2013 - 16:22:28 EST


Hi guys,

Sometimes during an oops or panic, we do not get the kernel version from
the console output. For example :

[Fri Oct 4 03:49:48 2013][ 204.396041] BUG: unable to handle
kernel NULL pointer dereference at 0000000000000010
[Fri Oct 4 03:49:48 2013][ 204.400010] IP:
[<ffffffff811889a0>] rb_insert_color+0x1c/0xe5
[Fri Oct 4 03:49:48 2013][ 204.400010] Oops: 0000 [#1]
PREEMPT SMP
[Fri Oct 4 03:49:48 2013][ 204.400010] last sysfs file:
/sys/devices/pci0000:00/0000:00:08.0/net/ma1/operstate
[Fri Oct 4 03:49:48 2013][ 204.400010] Stack:
[Fri Oct 4 03:49:48 2013][ 204.400010] Call Trace:
[Fri Oct 4 03:49:49 2013][ 204.400010] Code: 06 48 8b 17
83
e2 03 48 09 c2 48 89 17 c9 c3 55 48 89 e5 41 57 41 56 49 89 f6 41 55 49 89
fd
41 54 53 e9 a4 00 00 00 49 83 e4 fc <49> 8b 44 24 10 48 39 c3 75 44 49 8b
44 24
08 48 85 c0 74 08 48
[Fri Oct 4 03:49:49 2013][ 204.400010] RIP
[<ffffffff811889a0>] rb_insert_color+0x1c/0xe5
[Fri Oct 4 03:49:49 2013][ 204.400010] CR2:
0000000000000010
[Fri Oct 4 03:49:51 2013]1RU scd
/sys/bus/pci/devices/0000:01:06.0/resource0 found, stopping phys
(0xc3e803f2)

This is because the console loglevel is lower than the default log level
of the printk in in arch/x86/kernel/dumpstack.c: dumpstack()
function. It would be great if we had the kernel version
printed along with all the other information on the console. This would
help us to debug/reporduce the crash with the kernel built with the same
change number/code when the change number is part of the kernel version.
Please also note that sometimes it is not possible
to get the whole crash log from /var/log/console. On our systems,
we do have the crash kernel mechanism set up but because of an unknown
reason that we are still investigating, it does not reliably get
triggered on some cases. The console log is only
what we have.

Also notice that sometimes all code paths does not even call dumpstack().
For example, see no_context() in arch/x86/mm/fault.c. This is what gets
called in case of page fault. When page_fault happens in the kernel, it
eventually calls __die(). This code calls show_registers() that does not
call dump_stack() or print out the kernel version.

Bottom line, there should be a reliable way to get the kernel version from
the console when a crash happens (in any code path). I am proposing the
attached patch that should help in these cases. I have compiled and tested
the patch on our HW.

Any feedback will be greatly appreciated.

Cheers,
Ani

Print kernel version on console during panic and oops.

Sometimes during an oops or panic, we do not get the kernel version from
the crash log. For example :

[Fri Oct 4 03:49:48 2013][ 204.396041] BUG: unable to handle
kernel NULL pointer dereference at 0000000000000010
[Fri Oct 4 03:49:48 2013][ 204.400010] IP:
[<ffffffff811889a0>] rb_insert_color+0x1c/0xe5
[Fri Oct 4 03:49:48 2013][ 204.400010] Oops: 0000 [#1]
PREEMPT SMP
[Fri Oct 4 03:49:48 2013][ 204.400010] last sysfs file:
/sys/devices/pci0000:00/0000:00:08.0/net/ma1/operstate
[Fri Oct 4 03:49:48 2013][ 204.400010] Stack:
[Fri Oct 4 03:49:48 2013][ 204.400010] Call Trace:
[Fri Oct 4 03:49:49 2013][ 204.400010] Code: 06 48 8b 17 83
e2 03 48 09 c2 48 89 17 c9 c3 55 48 89 e5 41 57 41 56 49 89 f6 41 55 49 89 fd
41 54 53 e9 a4 00 00 00 49 83 e4 fc <49> 8b 44 24 10 48 39 c3 75 44 49 8b 44 24
08 48 85 c0 74 08 48
[Fri Oct 4 03:49:49 2013][ 204.400010] RIP
[<ffffffff811889a0>] rb_insert_color+0x1c/0xe5
[Fri Oct 4 03:49:49 2013][ 204.400010] CR2: 0000000000000010
[Fri Oct 4 03:49:51 2013]1RU scd
/sys/bus/pci/devices/0000:01:06.0/resource0 found, stopping phys (0xc3e803f2)

This is because the console loglevel is lower than the default log level of the printk
printing the version. It would be great if we had the kernel version printed along with
all the other information. This would help us debug/reporduce the crash with the kernel built with
the same change number. Please also note that sometimes it is not possible to get the
whole crash log from /var/log/console. The console log is only what we have.

Signed-off-by : Ani Sinha <ani@xxxxxxxxxxxxxxxxxx>

Index: linux-2.6.38/arch/x86/kernel/dumpstack.c
===================================================================
--- linux-2.6.38.orig/arch/x86/kernel/dumpstack.c
+++ linux-2.6.38/arch/x86/kernel/dumpstack.c
@@ -15,6 +15,7 @@
#include <linux/bug.h>
#include <linux/nmi.h>
#include <linux/sysfs.h>
+#include <linux/utsname.h>

#include <asm/stacktrace.h>

@@ -199,10 +200,10 @@ void dump_stack(void)
{
unsigned long stack;

- printk("Pid: %d, comm: %.20s %s %s %.*s\n",
+ printk(KERN_EMERG "Pid: %d, comm: %.20s %s %s %.*s\n",
current->pid, current->comm, print_tainted(),
init_utsname()->release,
- (int)strcspn(init_utsname()->version, " "),
+ (int)strcspn(init_utsname()->version, " "),
init_utsname()->version);
show_trace(NULL, NULL, &stack);
}
@@ -304,6 +305,10 @@ int __kprobes __die(const char *str, str
printk_address(regs->ip, 1);
printk(" RSP <%016lx>\n", regs->sp);
#endif
+ printk(KERN_EMERG "Kernel version : %s %s %.*s\n", print_tainted(),
+ init_utsname()->release,
+ (int)strcspn(init_utsname()->version, " "),
+ init_utsname()->version);
return 0;
}