On Fri, May 2, 2008 at 8:19 AM, Jesper Krogh <jesper@xxxxxxxx> wrote:Jesper Krogh wrote:
I've tried to explore this suggestion (the best I could).I'd suspect that after 1e8 loops your CPU got too hot and started toHardware is an Sun Fire X4600 (8xdual-core AMD64 processors). The
misbehave.
problem seem to be tied to this filesystem. (I cannot havent been able
to reproduce it on the /-mounted disk of the same system. So if a cpu
problem.. then it shouldn't be tied to a specific filesystem?
This is the only activity on the system .. so a load of 1 / 16cpus.
There are 2 ext3 filesystems locally mounted. / and this one. Running 16
parallel runs of this program on a file on the /-mounted filesystem cannot
reproduce the problem. If it was linked to hot hardware, I believe I should
be able to reproduce it this way. The servers are in a 17 degress
serverroom.
It changes alot when.. it actually happens. The "earliest ones" has been
from 200000 cycles.
Run 16 in parallel on /, and another 16 simultaneously on the trouble
filesystem? If you continue to get errors only on the 'trouble'
filesystem, and no errors start occurring on / coincident, then it
sounds pretty localized.
BTW, I may have missed this earlier, but does it happen *anywhere* on
the troublesome filesystem (ie, in a newly created subdirectory)?