Lockup/reboots on multiple dual core Opteron systems

From: Michael Madore
Date: Fri Apr 14 2006 - 12:48:27 EST


Hi,

I have several dozen systems based on the AMD 8111/8131 chipset which
are experiencing lockups or reboots while under heavy stress. All
systems are configured identically:

(2) dual core Opteron 285 processors
8 GB RAM
1 IDE hard disk

When the lockup occurs, there is no information printed to the screen,
and magic sysrq does not work. The systems will often lockup within
a matter of hours, but sometimes they can run for several days before
locking up.

The testing involves:

1) Allocate several GB of memory from each node and read/write to it.
2) Perform lots of disk I/O
3) Keep all cores busy

I see the lockup no matter which kernel I try:

2.6.5 (SLES 9)
2.6.9 (RHEL 4)
2.6.16 (kernel.org)

The systems will run perfectly stable under the following conditions:

1) Use a 32-bit kernel
or
2) Boot 64-bit kernel with mem=3000M
or
3) Don't perform any disk I/O during the test

I have also tested the dual core 8GB combination on both NForce and
Serverworks based motherboards with the same results. In this case
both systems were using SATA hard disks instead of IDE.

Any ideas what the culprit could be?

Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/