Re: [PATCH] barrier patch set

From: Stephen C. Tweedie
Date: Tue Mar 30 2004 - 16:55:18 EST


Hi,

On Tue, 2004-03-30 at 20:19, Chris Mason wrote:


> I think we're mixing a few concepts together. submit_bh(WRITE_BARRIER,
> bh) gives us an ordered write in whatever form the lower layers can
> provide. It also ensures that if you happen to call wait_on_buffer()
> for the barrier buffer, the wait won't return until the data is on
> media.

Right, but that's just how it works right now --- one doesn't _have_ to
imply the other. You could easily imagine an implementation that
implements barriers and flushing separately, and which does not do
automatic flushing on completion of WRITE_BARRIER IOs. SCSI with
writeback caching enabled might be one example of that. NBD/DRBD would
be another likely candidate --- if you've got network latencies in the
way, then a flushing sync may be far more expensive than a barrier
propagation.

Unfortunately, a lot of the cases we care about really have to do the
barrier via flushing, so the benefit of keeping them separate is
limited. For LVM/raid0, for example, we've got no way of preserving
ordering between IOs on different drives, so a flush is necessary there
unless we start journaling the low-level IOs to preserve order.

> The fact that IDE implements the barriers via a flush and that no other
> layer has barriers right now is secondary ;) I do want to revive some
> kind of ordering for scsi as well, but since IDE is a safety issue right
> now I wanted to concentrate there first.

Right.

> Yes, they are completely different. blkdev_issue_flush is the kernel
> recognizing that wait_on_buffer (or page or whatever) isn't always
> enough to make sure writes are really done.

Yep. It scares me to think what performance characteristics we'll start
seeing once that gets used everywhere it's needed, though. If every raw
or O_DIRECT write needs a flush after it, databases are going to become
very sensitive to flush performance. I guess disabling the flushing and
using disks which tell the truth about data hitting the platter is the
sane answer there.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/