Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
From: Alex Bligh
Date: Thu Sep 15 2016 - 08:11:35 EST
Christoph,
> It's not a write barrier - a write barrier was command that ensured that
>
> a) all previous writes were completed to the host/client
> b) all previous writes were on non-volatile storage
>
> and
>
> c) the actual write with the barrier bit was on non-volatile storage
Ah! the bit you are complaining about is not the bit I pointed to you, but:
> NBD_CMD_FLUSH (3)
>
> A flush request; a write barrier.
I can see that's potentially confusing as isn't meant to mean 'an old-style
linux kernel block device write barrier'. I think in general terms it
probably is some form of barrier, but I see no problem in deleting the
words "a write barrier" from the spec text if only to make it
clearer. However, I think the description of the command itself:
> The server MUST NOT send a successful reply header for this request before all write requests for which a reply has already been sent to the client have reached permanent storage (using fsync() or similar).
and the ordering section I pointed you to before, were both correct, yes?
>> The point still remains that "X was sent before Y" is difficult to
>> determine on the client side if X was sent over a different TCP channel
>> than Y, because a packet might be dropped (requiring a retransmit) for
>> X, and similar things. If blk-mq can deal with that, we're good and
>> nothing further needs to be done. If not, this should be evaluated by
>> someone more familiar with the internals of the kernel block subsystem
>> than me.
>
> The important bit in all the existing protocols, and which Linux relies
> on is that any write the Linux block layer got a completion for earlier
> needs to be flushed out to non-volatile storage when a FLUSH command is
> set. Anything that still is in flight does not matter. Which for
> NBD means anything that you already replies to need to be flushed.
... that's what it says (I hope).
> Or to say it more practicly - in the nbd server you simply need to
> call fdatasync on the backing device or file whenever you get a FLUSH
> requires, and it will do the right thing.
actually fdatasync() technically does more than is necessary, as it
will also flush commands that have been processed, but for which no
reply has yet been sent - that's no bad thing.
--
Alex Bligh