On Tue, Jan 30, 2007 at 10:36:03AM -0500, Phillip Susi wrote:Did you intentionally drop this reply off list?
No.
No, it doesn't... or at least can't report WHERE the error is.
O_SYNC doesn't report where the error is either, try a write(fd, buf,
10*1024*1024).
Typically you only want one sector of data to be written before you continue. In the cases where you don't, this might be nice, but as I said above, you can't handle errors properly.
Sorry but you're dreaming if you're thinking anything in real life
writes at 512bytes at time with O_SYNC. Try that with any modern
harddisk.
Just grep for fsync in the db code of your choice (try postgresql) andDoesn't sound like a very good idea to me.
then explain me why they ever call fsync in their code, if you know
how to do better with O_SYNC ;).
Why not a good idea to check any real life app?
The stalling is caused by cache pollution. Since you did not specify a block size dd uses the base block size of the output disk. When combined with sync, only one block is written at a time, and no more until the first block has been flushed. Only then does dd send down another block to write. Without dd the kernel is likely allowing many mb to be queued in the buffer cache. Limiting output to one block at a time is not good for throughput, but allowing half of ram to be used by dirty pages is not good either.
Throughput is perfect. I forgot to tell I combine it with ibs=4k
obs=16M. Like it would be perfect with odirect too for the same
reason. Stalling the I/O pipeline once every 16M isn't measurable in
The semantics of the two are very much the same; they only differ in the internal implementation. As far as the caller is concerned, in both cases, he is sure that writes are safe on the disk when they return, and reads semantically are no different with either flag. The internal implementations lead to different performance characteristics, and the other post was simply commenting that the performance characteristics of O_SYNC + madvise() is almost the same as O_DIRECT, or even better in some cases ( since the data read may already be in cache ).
The semantics mandates the implementation because the semantics make
up for the performance expectations. For the same reason you shouldn't
write 512bytes at time with O_SYNC you also shouldn't use O_SYNC if
your device risks to create a bottleneck in the CPU and memory.