fsync occasionally very slow

From: Matthew Kirk
Date: Fri Jan 26 2007 - 05:37:58 EST


Hi

I am working on an application that fsync(2) files and directories as
needed. I’m seeing intermittent stalls on fsync that can last many
seconds. The file system is mounted ext3, however I have reproduced the
problem with ext2. The as scheduler is in use. I have also tried
deadline. I have tried 2.6.12.3 (what I’m currently deploying on – yes, I
know it’s old) and 2.6.15 (highest kernel I could easily get to run). I
have reproduced the problem on both machines with high performance raid
controllers and without. I haven’t been able to find any reported issues
that describe this problem in a relatively recent kernel (though I did find
some comments about 2.0).

In the original application, data is received from another machine via many
threads. These threads all write occasionally, usually in chunks of about
1K or 100K (I have also tried using 10M chunks, though some percentage are
still very small). Half of all files are in the 10-15MB range. The other
half are very small. There are also log files that get appended to
regularly. Except for logs, we fsync right before each file is closed. We
also fsync directories. Everything is on a single large stripe. In the
regular application, fsync can happen from any of a number of threads.

In the application, if one call to fsync blocks, other threads that
subsequently call fsync while the first one is blocked appear to also block.
Once the first thread's fsync releases, the others appear to release too.

I’m not 100% sure it’s fsync vs. some other file system issue. When I
instrument my code I find that about 90% of the stalls are on fsync, with
the remainder being on other file system operations (e.g. write). If I
disable fsync in the application I still see stalls, though a little less
frequently.

When running I see what appear to be normal numbers of pdflush jobs (range
2-8, usually 4-5). When I get application stalls the system is still
otherwise responsive. I can still start processes, write files, etc.
Typically there are 4-5 pdflushes during stalls too. CPU utilization (user)
is about 10% and I/O wait is 60-70%. According to iostat, although blocks
are sometimes getting written during these stalls, they frequently are not.

I have reproduced what I think is the same problem in the stand-alone,
single-threaded source code attached, which opens files, writes to them, and
then only when done writing all files, fsyncs them and closes them in the
order in which they were written. In the standalone application, the first
files that get written take virtually no time to fsync (already flushed),
then once files start to be flushed time will increase, however at least one
file will take a couple seconds to flush, where the others will be in the
range of a few thousandths of a second.

I’m not sure where to go with this now and would appreciate any insights you
have.

Thanks!
Matt Kirk


#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <fcntl.h>
#include <string>
#include <iostream>
#include <ostream>
#include <sstream>
const unsigned int bigsize = 10248576;
const unsigned int smallsize = 10248576;

char *buff;
int main (int argc, const char *argv[]) {
int q;
int innerloop = 100;
int outerloop = 1000;
int dofsync = 1;
if (argc > 1) {
dofsync=(atoi(argv[1]));
}
if (argc > 2) {
innerloop=(atoi(argv[2]));
}
if (argc == 4) {
outerloop=(atoi(argv[3]));
}
buff = new char[bigsize];
memset(buff, 1, bigsize);
int fd[innerloop];
for (int r = 0; r < outerloop; r++) {
struct timeval starttimer,endtimer;
gettimeofday(&starttimer, NULL);
int count;
for (int q = 0; q < innerloop; q++) {
std::stringstream s;
s << "file_" << q+(r*outerloop);
fd[q] = open(s.str().c_str(), O_RDWR+O_CREAT+O_APPEND);
if (fd[q] == -1) { std::cout << "Failed to open " << s.str() << "
due to " << errno << std::endl; return(0); }
if (q % 4 != 1) {
count = write(fd[q],(void*) buff, bigsize);
if (count != bigsize) { std::cout << "write returned " << count
<< " with errno " << errno << std::endl; return (0); }
} else {
count = write(fd[q],(void*) buff, smallsize);
if (count != smallsize) {
std::cout << "write returned " << count << " with errno " <<
errno << std::endl;
return (0);
}
}
}

for (int q = 0; q < innerloop; q++) {
struct timeval mytimer, mytimer2;
gettimeofday(&mytimer, NULL);
if (dofsync) {
count = fsync(fd[q]);
} else {
count = fdatasync(fd[q]);
}
gettimeofday(&mytimer2, NULL);
if (mytimer.tv_usec > mytimer2.tv_usec) {
mytimer2.tv_sec -= (mytimer.tv_sec+1);
mytimer2.tv_usec += (1000000-mytimer.tv_usec);
} else {
mytimer2.tv_sec -= mytimer.tv_sec;
mytimer2.tv_usec -= mytimer.tv_usec;
}

if (count != 0) { std::cout << "fsync returned " << count << " with
errno " << errno << std::endl; } else {
std::cout << " elapsed time for sync of descriptor" << fd[q] <<
" was " << mytimer2.tv_sec << "s " << mytimer2.tv_usec
<< "usec" << std::endl;
count = close(fd[q]);
}
}
gettimeofday(&endtimer, NULL);
if (starttimer.tv_usec > endtimer.tv_usec) {
endtimer.tv_sec -= (starttimer.tv_sec+1);
endtimer.tv_usec += (1000000-starttimer.tv_usec);
} else {
endtimer.tv_sec -= starttimer.tv_sec;
endtimer.tv_usec -= starttimer.tv_usec;
}
std::cout << " elapsed time for pass " << r << " was " <<
endtimer.tv_sec << "s " << endtimer.tv_usec << " usec " <<
std::endl;
}
return 0;
}






--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.410 / Virus Database: 268.17.11/652 - Release Date: 1/25/2007

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/