Re: [PATCH] [revised -- version 2] Info dump on Oops or panic()
From: Andrew Morton
Date: Thu Jun 28 2007 - 19:49:19 EST
On Thu, 28 Jun 2007 16:12:23 -0700 (PDT)
Joshua Wise <jwise@xxxxxxxxxx> wrote:
> Version: 2
>
> Background:
> When managing a large number of servers, as Google does, it's sometimes
> useful to get an "at-a-glance" view of a machine when it crashes. When no
> other post-mortem is possible, it's often useful to know how long the
> machine has been powered on, which kernel it was running, and possibly
> other information that a driver may have logged. Up until now, there was no
> real way to do this other than to keep statistics outside to the machine.
>
> Description:
> This patch adds a call chain to be invoked when the system oopses or
> panics. Traps have been added to i386 and x86_64 kernels, but it should be
> trivial to add support for more architectures. Two sample notifiers have
> been added -- an uptime printer, and a utsname printer for showing various
> statistics about the running system.
>
> Remaining issues:
> * Is do_posix_clock_monotonic_gettime really the right API?
> * Should this be on by default, or a disablable Kconfig entry?
>
> Testing:
> I accidentally built my testing kernel without the IDE drivers. It panic()ed
> immediately, and informed me that my uptime was 0.38 seconds.
>
> Credits:
> This code was mainly written by Mike Waychison <mikew@xxxxxxxxxx> and
> Masoud Sharbiani <masouds@xxxxxxxxxx>.
>
> Changelog:
> v2 -- fixed some checkpatch issues. Moved dumping to before Oopses, as per
> the suggestion of Kyle McMartin <kyle@xxxxxxxxxxx>.
>
> diff --git a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c
> index 06dfa65..f969c04 100644
> --- a/arch/i386/kernel/process.c
> +++ b/arch/i386/kernel/process.c
> @@ -301,6 +301,8 @@ void show_regs(struct pt_regs * regs)
> {
Your email client is doing space-stuffing. It's easy enough to fix at this
end, but even easier if you fix it ;)
> unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
>
> + atomic_notifier_call_chain(&info_dumper_list, 0, NULL);
> +
I worry a bit about doing _anything_ extra at oops-time. It just decreases
the chances of the kernel generating the info which we want.
So... Please consider abandoning the notifier-chain and just go for a
simple function call.
If people want to add extra goodies later on, well, they need to patch the
kernel anyway so they can patch your newly-added function rather than
hooking into the notifier chain.
> EXPORT_SYMBOL(cad_pid);
>
> /*
> + * Notifier list for kernel code which wants to printk hardware specific
> + * information at Oops or panic time.
> + */
> +
> +ATOMIC_NOTIFIER_HEAD(info_dumper_list);
> +EXPORT_SYMBOL(info_dumper_list);
That export isn't needed.
> +/*
> + * Dump out UTS info on oops / panic.
> + */
> +
> +static int dump_utsname(struct notifier_block *self, unsigned long v, void *p)
> +{
> + printk (KERN_EMERG "%s %s %s %s %s %s\n",
> + utsname()->sysname,
> + utsname()->nodename,
> + utsname()->release,
> + utsname()->version,
> + utsname()->machine,
> + utsname()->domainname);
> + return 0;
> +}
Again, a deref of current->nsproxy->uts_ns->name at oops-time has risks.
This string could be precalculated, no?
> +static struct notifier_block utsname_notifier;
> +
> +static int __init register_utsname_dump(void) {
<wishes that all nooglers got a copy of K&R>
> + utsname_notifier.notifier_call = dump_utsname;
> + atomic_notifier_chain_register(&info_dumper_list, &utsname_notifier);
> + return 0;
> +}
> +__initcall(register_utsname_dump);
module_init() is a bit more modern, but equivalent.
> +/*
> + * Dump out uptime info on oops / panic.
> + *
> + * FIXME: Should I be just using get_jiffies64()? what if the lock there
> + * is damaged beyond any recognition?
> + */
> +
> +static int dump_uptime(struct notifier_block *self, unsigned long v, void *p)
> +{
> + struct timespec uptime;
> + /* The logic below is very much like how kernel
> + * prepares /proc/uptime.
> + */
> + do_posix_clock_monotonic_gettime(&uptime);
> + printk(KERN_EMERG "Uptime (seconds): %lu.%02lu\n",
> + (unsigned long) uptime.tv_sec,
> + (uptime.tv_nsec / (NSEC_PER_SEC / 100)));
> +
> + return 0;
> +}
do_posix_clock_monotonic_gettime() takes xtime_lock. So if the kernel
oopses while holding xtime_lock we don't get to see the oops. This is bad.
I don't know what to do here. It will be hard to find a read-the-time
function which is a) lockless and b) available on all architectures and
configs.
If you can find a way to use plain old jiffies, that'd be good.
> +static struct notifier_block uptime_notifier;
> +
> +static int __init register_uptime_dump(void) {
thwap
> + uptime_notifier.notifier_call = dump_uptime;
> + atomic_notifier_chain_register(&info_dumper_list, &uptime_notifier);
> + return 0;
> +}
> +__initcall(register_uptime_dump);
module_init()
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/