Re: [PATCH] barrier patch set

From: Stephen C. Tweedie
Date: Tue Mar 30 2004 - 11:08:43 EST


On Sat, 2004-03-20 at 17:05, Bartlomiej Zolnierkiewicz wrote:

> Jens, can you explain how this translates to the block layer?
> If "log blocks" is a separate request from "commit block",
> we can just do: log blocks, flush, commit block, flush cycle.

It's not so simple. There are two different operations here as far as
the fs is concerned --- ordering ("barrier"), and waiting ("flush").
For most journaling purposes, a filesystem really doesn't have to know
when data has hit disk --- it only needs to be assured that certain
ordering constraints will be obeyed. So in SCSI terms we can submit the
log blocks, then set an ORDERED queue tag on the commit block, and leave
it at that. We don't have to wait until that ORDERED tag actually
becomes persistent on disk --- we don't care, in most cases. This is
the "barrier", and in journaling, we basically need a barrier before and
after the commit block.

The tricky bit is that _sometimes_ we find, later on, that we need to
know when data has hit disk. If an application requests synchronised IO
(such as fsync, O_SYNC or O_DIRECT) then we cannot legally return until
the disk has told us that the data is persistent for certain. _That_ is
a full flush, and it's distinct from the simple ordering constraint that
we normally have for journal commit blocks.

The distinction becomes important if flushing and barriers have
significantly different performance characteristics. In the SCSI tagged
command case, we can insert an ORDERED queue tag into the pipeline
without having to wait for anything. But if we implement the barrier
and the flush by the same mechanism, then the fs ends up waiting; we get

write log blocks
wait for IO queue to empty
write commit block
wait for IO queue to empty

instead of

write log blocks
write commit block with ORDERED tag set

so the commits are miles slower.

Jens' patch happens to implement barriers via flushes on IDE, but at the
block layer blkdev_issue_flush() and set_buffer_ordered() are quite
distinct APIs.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at