Re: Queuing of disk writes

From: Ted Ts'o
Date: Mon Apr 04 2011 - 09:54:01 EST


On Fri, Apr 01, 2011 at 12:59:53PM -0700, Charles Samuels wrote:
>
> I have an application that is writing large amounts of very
> fragmented data to harddrives. That is, I could write megabytes of
> data in blocks of a few bytes scattered around a multi-gigabyte
> file.

Doctor, doctor, it hurts when I do this.... any way you can avoid
doing this? What is your application doing at the high level.

> Obviously, doing this causes the harddrive to seek a lot and takes a
> while. From what I understand, if I allow linux to cache the
> writes, it will fill up the kernel's write cache, and then
> consequently the disk drive's DMA queue. As a result of that, the
> harddrive can pick the correct order to do these writes,
> significantly reducing seek times.

This is one way to avoid some of the seeks, yes.

> However, there's a major cost in allowing the write cache to fill:
> fsync takes *ages*. What's worse is that while fsync is proceeding,
> it seems *all* disk operations in the OS are blocked. This is really
> terrible for performance of my application: my application might
> want to do some reads (i.e. from another thread) from the disk
> preempting the fsync temporarily. It's also really terrible for me,
> because then my workstation becomes unresponsive for several
> minutes.

Who or what is calling fsync()? Is it being called by your
application because you want to initiate writeout? Or is it being
called by some completely unrelated process?

If it is being called by the application, one thing you can do is to
use the Linux-specific system call sync_file_range(). You can use
this to do asynchronous data flushes of the file, and control which
range of bytes are written out, which can also help avoid flooding the
disk with too many write requests.

If the fsync() is being called by some other process, then yeah, you
have a problem. What I'd suggest is using a separate partition and
file system for this application of yours which needs to write
megabytes and megabytes of data at random locations in this
multi-gigabyte file of yours....

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/