Re: Many open/close on same files yeilds "No such file or directory".

From: Ray Lee
Date: Fri May 02 2008 - 12:45:53 EST


On Fri, May 2, 2008 at 8:55 AM, Jesper Krogh <jesper@xxxxxxxx> wrote:
>
> Ray Lee wrote:
>
> > On Fri, May 2, 2008 at 8:19 AM, Jesper Krogh <jesper@xxxxxxxx> wrote:
> >
> > > Jesper Krogh wrote:
> > >
> > >
> > > >
> > > > > I'd suspect that after 1e8 loops your CPU got too hot and started to
> > > > > misbehave.
> > > > >
> > > > >
> > > > Hardware is an Sun Fire X4600 (8xdual-core AMD64 processors). The
> > > > problem seem to be tied to this filesystem. (I cannot havent been able
> > > > to reproduce it on the /-mounted disk of the same system. So if a cpu
> > > > problem.. then it shouldn't be tied to a specific filesystem?
> > > >
> > > > This is the only activity on the system .. so a load of 1 / 16cpus.
> > > >
> > > >
> > > I've tried to explore this suggestion (the best I could).
> > >
> > > There are 2 ext3 filesystems locally mounted. / and this one. Running
> 16
> > > parallel runs of this program on a file on the /-mounted filesystem
> cannot
> > > reproduce the problem. If it was linked to hot hardware, I believe I
> should
> > > be able to reproduce it this way. The servers are in a 17 degress
> > > serverroom.
> > >
> > > It changes alot when.. it actually happens. The "earliest ones" has
> been
> > > from 200000 cycles.
> > >
> >
> > Run 16 in parallel on /, and another 16 simultaneously on the trouble
> > filesystem? If you continue to get errors only on the 'trouble'
> > filesystem, and no errors start occurring on / coincident, then it
> > sounds pretty localized.
> >
>
> That test has been done. I can only reproduce it on this filesystem. But I
> cannot really conclude that it is only present there.. since sometimes my
> testprogram just goes on .. and dies past 1 billion cycles. But I have never
> gotten errors from the / filesystem on the same installation.

Sorry for belaboring the point, but reading up-thread I see you ran
both / and /troublefs one after the other, but you're saying you also
ran them at the same time? If so, that would seem to conclusively rule
out hardware overheating.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/