On Mon, Mar 30 2009, Linus Torvalds wrote:
On Mon, 30 Mar 2009, Jens Axboe wrote:The problem is that we may not know upfront, so it sort-of has to beWell, absolutely. Except I don't think you shoul use ENOTSUPP, you should just set a bit in the "struct request_queue", and then return 0.
this trial approach where the first barrier issued will notice and fail
with -EOPNOTSUPP.
IOW, something like this
--- a/block/blk-barrier.c
+++ b/block/blk-barrier.c
@@ -318,6 +318,9 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector)
if (!q)
return -ENXIO;
+ if (is_queue_noflush(q))
+ return 0;
+
bio = bio_alloc(GFP_KERNEL, 0);
if (!bio)
return -ENOMEM;
@@ -339,7 +342,7 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector)
ret = 0;
if (bio_flagged(bio, BIO_EOPNOTSUPP))
- ret = -EOPNOTSUPP;
+ set_queue_noflush(q);
else if (!bio_flagged(bio, BIO_UPTODATE))
ret = -EIO;
which just returns 0 if we don't support flushing on that queue.
(Obviously incomplete patch, which is why I also intentionally whitespace-broke it).
Sure, we could cache this value, but it's prettyWell no, it won't. Or rather, it will have to have such a stupid per-filesystem flag, for no good reason.
pointless since the filesystem will stop sending barriers in this case.
Sorry, I just don't see much point to doing it this way instead. So now
the fs will have to check a queue bit after it has issued the flush, how
is that any better than having the 'error' returned directly?
For blkdev_issue_flush() it may not be very interesting, since there'sSo?
not much we can do about that. Just seems like very bad style to NOT
return an error in such a case. You can assume that ordering is fine,
but it definitely wont be in all case (eg devices that have write back
caching on by default and don't support flush).
The thing is, you can't _do_ anything about it. So what's the point in returning an error? The caller cannot possibly care - because there is nothing the caller can really do.
Not for blkdev_issue_flush(), all they can do is report about the
device. And even that would be a vague "Your data may or may not be
safe, we don't know".
Sure, the device may or may not re-order things, but since the caller can't know, and can't really do a thing about it _anyway_, you're just better off not even confusing anybody.
I'd call that a pretty reckless approach to data integrity, honestly.
You HAVE to issue an error in this case. Then the user/admin can at least
check up on the device stack in question, and determine whether this is
an issue or not. That goes for both blkdev_issue_flush() and the actual
barrier write. And perhaps the cached value is then of some use, since
you then know when to warn (bit not already set) and you can keep the
warning in blkdev_issue_flush() instead of putting it in every call
site.