Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices,filesystems, and dm/md.

From: Tejun Heo
Date: Thu May 31 2007 - 23:26:47 EST


Stefan Bader wrote:
> 2007/5/30, Phillip Susi <psusi@xxxxxxxxxx>:
>> Stefan Bader wrote:
>> >
>> > Since drive a supports barrier request we don't get -EOPNOTSUPP but
>> > the request with block y might get written before block x since the
>> > disk are independent. I guess the chances of this are quite low since
>> > at some point a barrier request will also hit drive b but for the time
>> > being it might be better to indicate -EOPNOTSUPP right from
>> > device-mapper.
>>
>> The device mapper needs to ensure that ALL underlying devices get a
>> barrier request when one comes down from above, even if it has to
>> construct zero length barriers to send to most of them.
>>
>
> And somehow also make sure all of the barriers have been processed
> before returning the barrier that came in. Plus it would have to queue
> all mapping requests until the barrier is done (if strictly acting
> according to barrier.txt).
>
> But I am wondering a bit whether the requirements to barriers are
> really that tight as described in Tejun's document (barrier request is
> only started if everything before is safe, the barrier itself isn't
> returned until it is safe, too, and all requests after the barrier
> aren't started before the barrier is done). Is it really necessary to
> defer any further requests until the barrier has been written to save
> storage? Or would it be sufficient to guarantee that, if a barrier
> request returns, everything up to (including the barrier) is on safe
> storage?

Well, what's described in barrier.txt is the current implemented
semantics and what filesystems expect, so we can't change it underneath
them but we definitely can introduce new more relaxed variants, but one
thing we should bear in mind is that harddisks don't have humongous
caches or very smart controller / instruction set. No matter how
relaxed interface the block layer provides, in the end, it just has to
issue whole-sale FLUSH CACHE on the device to guarantee data ordering on
the media.

IMHO, we can do better by paying more attention to how we do things in
the request queue which can be deeper and more intelligent than the
device queue.

Thanks.

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/