> A heavily multithreaded application with lots of disk i/o causes a kernel
> infinite loop in the ext2 code.
>
> The application may run anywhere from an hour to a few days
> before the problem to occurs.
>
>
> Base release: RedHat 5.1
> Kernel version: 2.0.34, 2.035
> libc version: 2.0.7
>
> A thread will unexpectedly start to use all available CPU cycles.
>
> Gdb is sometimes unable to attach to this thread.
>
[SNIP]
>
> I have also been able to catch it at lines 331 and 343 in truncate.c.
>
[SNIP]
>
> If you have any ideas on what else I can do to track down this
> problem, let me know.
I have some info which is maybe related to this problem...
We also have a multithreaded application which is doing a lot of I/O. At
this moment I have restricted the application to only use two
worker-threads. Using more threads is guaranteed to crash the server.
Interresting thing is that when only doing reads and no writes, I am
unable to crash it. Okay this is not a very hard fact, but the read-only
server is running for more than half a year now without the problems I
have seen when writing is involved.
Because in the 2.0.x kernels we have an additional problem with file
locking, I thought everything was related to that.
A typical session is:
- mutex lock on file (between threads)
- open file (rw)
- lock file (read)
- read file
- upgrade lock on file (write)
- open file.bak (w)
- lock file.bak (write)
- write file.bak
- sync file.bak
- close file.bak
- rename file.bak file
- close file
It is very likely that the problem is with truncating or removing the file
too.
I have not investigated much in this problem because I was not in a real
need to have more than 2 threads for writing.
At this moment I am not able to do a lot of testing because the machine is
now used as a sort of semi-production test environment. Testing the
application itself is more important to our company :-(
Please note that this application does work correctly on Solaris 2.6 with
over 100 worker-threads.
Regards,
Berend.
--Berend Reitsma
Asset Control International | Phone: +31 (0)513 469100 P.O. Box 10 | Fax: +31 (0)513 461588 8408 ZH Lippenhuizen | Email: berend@asset-control.com The Netherlands | Web: www.asset-control.com
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/