Re: O_DIRECT foolish question

From: Bruno Diniz de Paula (diniz@cs.rutgers.edu)
Date: Thu Feb 13 2003 - 10:22:00 EST


Thanks, Andrew. So, no chances of getting this working correctly on 2.4
kernel for now (I mean, reading files with size != n*block_size), and
I'd better give up on this... Is it the case, or you think there is
still something to do to get this working on ext2 and 2.4 kernel?

Bruno.

On Thu, 2003-02-13 at 00:12, Andrew Morton wrote:
> Bruno Diniz de Paula <diniz@cs.rutgers.edu> wrote:
> >
> > On Wed, 2003-02-12 at 19:13, Chris Wedgwood wrote:
> > > If I had to guess, write should work more or less the same as reads
> > > (ie. I should be able to write aligned-but-smaller-than-page-sized
> > > blocks to the end of files).
> > >
> > > Testing this however shows this is *not* the case.
> >
> > This is not the case, I have also tested here and the file written has
> > n*block_size always. The problem with writing is that we can't sign to
> > the kernel that the actual data has finished and from that point on it
> > should zero-fill the bytes. And what is worse, the information about the
> > actual size is lost, since the write syscall will store what is passed
> > on the 3rd argument in the inode (field st_size of stat). This means
> > that after writing using O_DIRECT we can't read data correctly anymore.
> > The exception is when we write together with the data information about
> > the actual size and process disregarding information from stat, for
> > instance.
> >
> > Well, I am sure I am completely wrong because this doesn't make any
> > sense for me. Someone that has already dealt with this and can bring a
> > light to the discussion?
> >
>
> For writes, I don't think it is reasonable for the kernel to be have to
> handle byte-granular appends. O_DIRECT is different. For this case the
> application should ftruncate the file back to the desired size prior to
> closing it.
>
> For the short reads at EOF, the 2.4 kernel refuses to read anything, and
> returns zero. The 2.5 kernel will return -EINVAL, which is better behaviour
> (shouldn't make it just look like the file is shorter than it really is).
>
> The ideal behaviour is that which I mistakenly described previously: we
> should fill with zeroes and return the partial result. I'll look at
> converting 2.5 to do that. As long as the changes are small - the direct-io
> code does a ton of stuff, is complex, is not tested a lot and breakage tends
> to be subtle.

-- 
Bruno Diniz de Paula <diniz@cs.rutgers.edu>
Rutgers University


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Feb 15 2003 - 22:00:47 EST