[RFC] removal of sync in panic

From: Christian Borntraeger
Date: Wed Jul 14 2004 - 10:49:58 EST


I have a question regarding the sys_sync() call within the panic function.
--------snip---------------
printk(KERN_EMERG "Kernel panic: %s\n",buf);
if (in_interrupt())
printk(KERN_EMERG "In interrupt handler - not syncing\n");
else if (!current->pid)
printk(KERN_EMERG "In idle task - not syncing\n");
else
sys_sync(); <--------------------
bust_spinlocks(0);

#ifdef CONFIG_SMP
smp_send_stop();
#endif
--------------------------


I have seen panic failing two times lately on an SMP system. The box
panic'ed but was running happily on the other cpus. The culprit of this
failure is the fact, that these panics have been caused by a block device
or a filesystem (e.g. using errors=panic). In these cases the likelihood
of a failure/hang of sys_sync() is high. This is exactly what happened in
both cases I have seen. Meanwhile the other cpus are happily continuing
destroying data as the kernel has a severe problem but its not aware of
that as smp_send_stop happens after sys_sync.

I can imagine several changes but I am not sure if this is a problem which
must be fixed and which fix is the best.
Here are my alternatives:

1. remove sys_sync completely: syslogd and klogd use fsync. No need to help
them. Furthermore we have a severe problem which is worth a panic, so we
better dont do any I/O.
2. move smp_send_stop before sys_sync. This at least prevents other cpus of
doing harm if sys_sync hangs. Here I am not sure if this is really working.
3. Add an
if (doing_io())
printk(KERN_EMERG "In I/O routine - not syncing\n");
check like in_interrupt check. Unfortunately I have no clue how this can be
achieved and it looks quite ugly.

Thanks for any ideas and clarifications
cheers

Christian




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/