Re: Another Linux performance anomaly with 1+ GB files

Scott Laird (laird@pacificrim.net)
Wed, 12 Nov 1997 10:49:00 -0800


(Replying to my own message...)

In message <19971112092935.16761.qmail@speedbump.sigkill.org>, Scott Laird writ
es:
>
>We have a P6/200 system with 128 MB of RAM (with only 64 in use at
>present, I'll re-fix lilo as soon as I can reboot) that we're using as
>a Solid (http://www.solidtech.com/) database server. We've had
>problems with weird system stalls in the past, all of which were
>caused by fsync(2) taking over 100 seconds to run on large (> 1 GB)
>files.
>
>Now we seem to have a new problem, and I'm not sure if it's hardware
>or software. It appears that read(2) system calls occationally take a
>*long* time (10 minutes to an hour) to return. No SCSI errors (or any
>other kernel errors) have been logged. I've seen this happen three
>times in the last 90 minutes.

More information:

The pause times appear to grow longer -- the first pause I noticed
lasted under 10 minutes, the next was 50 minutes, and the next lasted
several hours. I'm not sure if this is repeatable; I can't really
afford to take our database server down again and find out.

The pause also occurs when simply copying the same file from a 4k
filesystem to a 1k filesystem. I didn't try copying in the other
direction.

Oddly enough, neither the cp nor the solid process was accumulating
any CPU time during any of the pauses.

Linux 2.1.62 doesn't *appear* to have the problem, but I can't say for
certain. Just copying the 1 GB file from filesystem to filesystem was
enough to trip the bug twice (once at ~340 MB, and once around 700 MB)
under 2.0.31pre10, but not 2.1.62. Unfortunately, it's not a perfect
test -- 2.1.62's improved memory detection means that the system
running with 128 MB of RAM available, while 2.0.x was only using 64
MB.

For now, I'm sticking with 2.1.62. The database seems to be running
fine with two 4 GB 1k ext2 filesystems and one 4 GB 4k ext2
filesystem.

Unfortunately, I still had to patch 2.1.62's fsync(). Out of the box,
2.1.62's fsync is abysmally slow on large files. My patch is just a
quick hack -- call file_fsync instead of ext2_sync_file in
ext2_file_operations.

Is *anyone* else actually working with 1 GB+ files on Linux? I can't
believe that we're the only ones with these sorts of problems.

Scott Laird
Pacific Rim Network, Inc.