Re: Many open/close on same files yeilds "No such file or directory".

From: Jesper Krogh
Date: Wed May 07 2008 - 16:52:43 EST


Ray Lee wrote:
On Mon, May 5, 2008 at 11:29 AM, Jesper Krogh <jesper@xxxxxxxx> wrote:
I'd been meaning to ask what the topology was. External, eh? Are you
sure the enclosure, cabling, and card/connectors are all good? Have
you tried swapping out cables?

It is new SCSI-controller, new cable and new terminator put onto it. But
(just enlighten me), if I had problems at this level I'd expect the
serverlog to be full of SCSI/FS-related errors and not just a single
syscall, that doesn't even touch the array due to caching, to be
failing.

Borderline hardware does not always create logged errors.

Ok. I think this _really_ point to a kernel problem.
(or just some broken hardware from Sun in multiple copies)

If I understood you correctly earlier, identical hardware on another
system does not show the error. That, quite honestly, rules out the
software.

Now I've moved the data to fresh ext3 filesystems on a storage-array
based on iscsi. Mounted the filesystems to another, similar server and
I can still reproduce the problem.

Both servers are 16 cores. The problem wasn't there on a different server with only 2 cores. (or I didn't run into it).

The 3 setups above has both been tested with a 2.6.22-14-server and 2.6.24-17-server (towards the iscsi volume).

Doing more testing show that I have 3 machines (all X4600, 16 cores/32GB ram that I can reproduce it on against different filesystem)

The more processes running on the system (accessing the FS volume), the
easier it seems to get into the problem.

What's left, however unlikely, has to be the issue. And what's left is
your scsi controller, the cable, and the external disk array.

Now I've removed all of them.. and still got the problem.

--
Jesper
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/