Re: Linux does not care for data integrity

From: Bill Davidsen
Date: Wed Jun 01 2005 - 19:25:12 EST


Matthias Andree wrote:

On Wed, 01 Jun 2005, Bill Davidsen wrote:



It's a matter of enforcing write order. In how far such ordering
constraints are propagated by file systems, VFS layer, down to the
hardware, is the grand question.


The problem is that in many options required to make that happen in the o/s, hardware, and application are going to kill performance. And even if you can control order of write, unless you can get write to final non-volatile media control you can get a sane database but still lose transactions.

If there was a way for the o/s to know when a physical write was done other than using flushes to force completion, then overall performance could be higher, but individual transaction might have greater latency. And the app could use fsync to force order of write as needed. In many cases groups of writes can be done in any order as long as they are all done before the next logical step takes place.



I have a déjà-vu, and I do believe that this discussion has taken place
in this list before, perhaps with a slightly different alignment, and
likely in the context of mail transfer agents and perhaps synchronous
directory (data) updates (file creation and such). Exposing a bit of the
queueing to the user space through new syscalls may be an interesting
experiment, although I do not have the resources to provide code.
Something like fsync() that doesn't flush the whole file system (which
appears to be the most common implementation) but tracks what is needed,
and that returns when data for a given file is on disk.



What I had in mind was not a "push" to flush anything anywhere, but rather a watch. As a hypothetical, I open a file and every time a write() is done a counter is incremented in the fd. That's the easy part. Then every time a physical write is completed the count is reduced. To allow for write combining the count could be in bytes rather than syscalls and physical operations. That's the hard part, I don't think the hardware is telling. In addition obviously writes may be combined between i/o related to several fds. But if that could be done, then fsync becomes "wait until my buffered byte count drops to zero," which could be an ioctl. Just having such a checkpoint would address some of the data coherency issues.

AFAIK this isn't possible with common ATA devices, and it clearly doesn't address every desirable feature. In spite of that, if someone better qualified to assess the problems and benefits cares to comment, fine. If not, at least I think I explained what I was thinking more clearly.



This would change the meaning of fsync from "force out the data" to "wait for the data to be written" in some implementations.



Naming suggestion: flazysync()





--
bill davidsen <davidsen@xxxxxxx>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/