Re: O_DIRECT question

From: Denis Vlasenko
Date: Mon Jan 29 2007 - 19:08:15 EST


On Monday 29 January 2007 18:00, Andrea Arcangeli wrote:
> On Sun, Jan 28, 2007 at 06:03:08PM +0100, Denis Vlasenko wrote:
> > I still don't see much difference between O_SYNC and O_DIRECT write
> > semantic.
>
> O_DIRECT is about avoiding the copy_user between cache and userland,
> when working with devices that runs faster than ram (think >=100M/sec,
> quite standard hardware unless you've only a desktop or you cannot
> afford raid).

Yes, I know that, but O_DIRECT is also "overloaded" with
O_SYNC-like semantic too ("write doesnt return until data hits
physical media"). To have two ortogonal things "mixed together"
in one flag feels "not Unixy" to me. So I am trying to formulate
saner semantic. So far I think that this looks good:

O_SYNC - usual meaning
O_STREAM - do not try hard to cache me. This includes "if you can
(buffer is sufficiently aligned, yadda, yadda), do not
copy_user into pagecache but just DMA from userspace
pages" - exactly because user told us that he is not
interested in caching!

Then O_DIRECT is approximately = O_SYNC + O_STREAM, and I think
maybe Linus will not hate this "new" O_DIRECT - it doesn't
bypass pagecache.

> O_SYNC is about working around buggy or underperforming VM growing the
> dirty levels beyond optimal levels, or to open logfiles that you want
> to save to disk ASAP (most other journaling usages are better done
> with fsync instead).

I've got a feeling that db people use O_DIRECT (its O_SYNCy behaviour)
as a poor man's write barrier when they must be sure that their redo
logs have hit storage before they start to modify datafiles.
Another reason why they want sync writes is write error detection.
They cannot afford delaying it.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/