server crashing

From: Bob Hutchinson
Date: Sat Dec 06 2003 - 12:36:30 EST


Hi all,
hope this is the right list for this question...

We have a server running RH7.3 (patched with RHN to date apart from Kernel
which is 2.4.20-13.7). Apart from the software RAID being broken & the
machine running in degraded mode for several months) it has been reliable
until recently.

It has started crashing with a console screen full of kernel messages - for
the last 5 days - always at 4am, though we have been unable to see which of
the 4am cron jobs might be tipping it over (we HAVE looked and thought it
might be logwatch [5.0.1] which we are now turning off).

Messages are too long & hexy to copy all ... here are the gist:
<snip>
bread <kernel>
ext3_mark_iloc_dirty <kernel>
ext3_truncate .....
.rodata .....
__jbd_kmalloc ....
start_this_handle ....
journal_start ...
start_transaction ...
ext3_delete_inode ...
iput ...
notify_change ...
d_delete ...
vfs_unlink ...
cached_lookup <kernel>
lookup_hash ...
sys_unlink ...
sys_close ...
system_call ...

Code: "strings of hex numbers all on one line ..."
</snip>

And that is it.

A hard RESET results in a reboot that hangs & drops into a shell - but
ANOTHER (second) hard RESET results in a successful & totally clean boot
(apart from the degraded raid messages and they appear on every boot).

We are baffled & do not know what is causing this or what we need to do.
Any help would be much appreciated as this is a busy server with over 1000
users on it.

Any help or pointers would be appreciated


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/