Re: Queuing of disk writes

From: Ted Ts'o
Date: Tue Apr 05 2011 - 15:37:57 EST


On Mon, Apr 04, 2011 at 10:50:12AM -0700, Charles Samuels wrote:
>
> > Who or what is calling fsync()? Is it being called by your
> > application because you want to initiate writeout? Or is it being
> > called by some completely unrelated process?
>
> It's being called by my own process. When fsync finishes, I update
> another file with some offset counters, fsync that, and with some
> luck, my writes are transactional.

OK, how often are you calling fsync()? Is this something where you
are trying to get transactional guarantees by calling fsync() between
each transaction? And if so, how big are you transactions? If you
are trying to call fsync() 10+ times/second, then your only hope
really is going to be a battery-backed RAID controller card, as David
Lang has already suggested.

> What would be good use of sync_file_range? It looks pretty useful,
> but I don't know how to make good use of it. For example,
> SYNC_FILE_RANGE_WRITE, wouldn't linux start this pretty much
> immediately?

No, not necessarily. Generally Linux will pause for a bit to
hopefully allow writes to coalesce.

The reason why I suggested sync_file_range() is because you mentioned
that you tried waiting until there was a large amount of data in the
page cache, and then you called fsync() and that was taking forever.
I assumed from that you didn't necessarily had ACID or transactional
requirements.

The advantage of using sync_file_range() is that instead of forcing a
blocking write for *all* of the data pages, you can only do it on part
of the your data pages. This would allow the writing from interfering
with subsequent reads that was taking place to your database.

All of this goes by the boards if you need data integrity guarantees,
of course; in that case you need to call fsync() after each atomic
transaction update...

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/