On Thu, Jan 25, 2007 at 12:43:23AM +1100, Nick Piggin wrote:
And why not just leave it in the pagecache and be done with it?
because what is in cache is then not coherent with what is on disk,
and a direct read is supposed to read the data that is present
in the file at the time it is issued.
All you need is to do a writeout before a direct IO read, which is
what generic dio code does.
No, that's not good enough - after writeout but before the
direct I/O read is issued a process can fault the page and dirty
it. If you do a direct read, followed by a buffered read you should
get the same data. The only way to guarantee this is to chuck out
any cached pages across the range of the direct I/O so they are
fetched again from disk on the next buffered I/O. i.e. coherent
at the time the direct I/O is issued.
but in that case you'll either have to live with some racyness
(which is what the generic code does), or have a higher level
synchronisation to prevent buffered + direct IO writes I suppose?
The XFS inode iolock - direct I/O writes take it shared, buffered
writes takes it exclusive - so you can't do both at once. Buffered
reads take is shared, which is another reason why we need to purge
the cache on direct I/O writes - they can operate concurrently
(and coherently) with buffered reads.