Exactly! Just imagine what the current ext2_sync_file() implementation
does to you with a 1 GB file and a few fsyncdata() per second.
The only viable workaround so far is the patch from Scott Laird
which reverts to fsync_dev() depending on file size.
This patch looks like the right approach.
> For some of them 4 blocks
> could be not enough though, perhaps a more generic version of your
> patch that handles more than 4 blocks could be found - like keeping
> the dirty blocks per inode on a special list.
Depending on database activity and transaction size this could
easily use much more than 4 blocks.
Theodore Y. Ts'o <tytso@MIT.EDU> wrote:
> If you look at my patch, you'll see that number of blocks saved is
> configurable by adjusting the NUM_EXT2_FFSYNC_BLKS #define. While it
> might make sense to increase that number, note that past a certain
> point, you might as well simply go through all of the indirect blocks.
>From my experience, going through the triple indirect blocks is
always calling for trouble. So if it takes too much overhead to keep
a bigger list of dirtied blocks, we might as well fall back to
fsync_dev() for bigger files.
> Adding blocks to the ffsync list is an order N-squared operation, due to
> the need to check to see if the block is already on the list.
We could use a more scalable algorithm, so having more entries would
not make such a big difference.
There is a tradeoff depending on usage: If you are dealing with
rather big files and use fdatasync() it makes a difference if you can
avoid the ext2_sync_file(). If not, you may not like the write()
performance beeing affected.
> We could increase NUM_EXT2_FFSYNC_BLKS, but it would be useful to get
> some actual performance results about how many blocks a typical
> transaction oriented database actually tends to write out before calling
> fsync() or fdatasync().
Define "typical". This depends on application software and
transaction
strategy and could vary for customers and db implementation.
Even with moderate database sizes (1 few 100 MB) having a larger list
of blocks would almost always be a big win.
> Anyone interesting in doing some data gathering?
I plan to run some performance tests during this week. I will add
some code to check for number of blocks dirtied between fsync().
I won't consider that "typical" though. It's just another datapoint.
It's probably also interesting to compare your patch against the
fsync_dev() workaround.
Michael
-- Michael Marxmeier Marxmeier Software GmbH E-Mail: mike@msede.com Besenbruchstrasse 9 Voice : +49 202 2431440 42285 Wuppertal, Germany Fax : +49 202 2431420 http://www.msede.com/- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/