Re: very strange issue with sata,<4G Ram, and ext3

From: Rick Warner
Date: Thu Apr 28 2005 - 12:34:20 EST


I forgot to mention the kernels that have been tried- 2.6.8.1, 2.6.11.7,
2.6.12-rc3, and a redhat 2.6.9.


On Thursday 28 April 2005 12:16 pm, Rick Warner wrote:
> Hello,
> We are having a very strange issue on some 64bit systems. We have a 32
> node cluster of EM64T's (supermicro boards). We are using our node restore
> software to propagate a linux install onto them. We do a pxe boot to a
> kernel and initrd image. The initrd has some config info, a basic root
> filesystem, and a restore script. The kernel is passed init=/restore (the
> restore script itself). The script runs dhcp, gets an ip, then nfs mounts
> the master node of the cluster. The backup image is stored on the master
> node's nfs mount. The script then applies a backed up partition table and
> then mkfs's the partitions, mounts them, untars a backup tar to the drive,
> and then makes it bootable with grub.
>
> On these systems, we are getting ext2 errors from the initrd during the
> untarring. Soon after, we start getting seg faults on random things (looks
> like stuff caused by the still running dhcp client), and then a continuous
> stream of segfaults on the restore script itself (restore[1]).
>
> The systems being restored are dual em64t's with 2G of ram and 200G sata
> drives. If we up the memory to 4G, the restores complete without error. If
> we reduce down to 512M, the segfaults start at the mkfs stage instead of
> the untar stage. We've tried different sata drives and controllers without
> change. Switching to ide drives works. Switching to reiserfs instead of
> ext3 for the destination drives works too. We've tried enabling the scsi
> debug stuff as well as the jbd debug stuff for ext3 without getting any
> more info. We also enabled the kernel debug options too. We've also tried
> using the deprecated ide based sata drivers instead of the scsi based ones
> without success. We have tried restoring to Intel's Jarell EM64T systems
> as well as an Arima HDAMA opteron with the same errors. We've also tried
> adding swap space ASAP in the inird image.
>
> This problem is really baffling us and we're not quite sure what to check
> into next. Any ideas?

--
Richard Warner
Lead Systems Integrator
Microway, Inc
(508)732-5517
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/