Re: Q about pagecache data never written to disk

From: Andrey Savochkin
Date: Sun Sep 05 2004 - 06:47:42 EST


Hi Andrew,

On Sun, Sep 05, 2004 at 03:52:33AM -0700, Andrew Morton wrote:
> Andrey Savochkin <saw@xxxxxxxxxxxxx> wrote:
> >
> > Let's suppose an mmap'ed (SHARED, RW) file has a hole.
> > AFAICS, we allow to dirty the file pages without allocating the space for the
> > hole - filemap_nopage just "reads" the page filling it with zeroes, and
> > nothing is done about the on-disk data until writepage.
> >
> > So, if the page can't be written to disk (no space), the dirty data just
> > stays in the pagecache. The data can be read or seen via mmap, but it isn't
> > and never be on disk. The pagecache stays unsynchronized with the on-disk
> > content forever.
>
> The kernel will make one attampt to write the data to disk. If that write
> hits ENOSPC, the page is not redirtied (ie: the data can be lost).
>
> When that write hits ENOSPC an error flag is set in the address_space and
> that will be returned from a subsequent msync(). The application will then
> need to do something about it.
>
> If your application doesn't msync() the memory then it doesn't care about
> its data anyway. If your application _does_ msync the pages then we
> reliably report errors.

This question came to my mind when I was thinking about journal_start in
ext3_prepare_write and copy_from_user issue...
Did you follow that discussion?

In the considered scenario not only the application is not
guaranteed anything till msync(), but all other programs doing regular read()
may also be fooled about the file content, and this idea surprised me.
On the other hand, after a write() other programs also see the new content
without a guarantee that this content corresponds with what is on the disk...

>
> > Is it the intended behavior?
> > Shouldn't we call the filesystem to fill the hole at the moment of the first
> > write access?
>
> That would be a retrograde step - it would be nice to move in the other
> direction: perform disk allocation at writeback time rather than at write()
> time, even for regular write() data. To do that we (probably) need space
> reservation APIs. And yes, we perhaps could reserve space in the
> filesystem when that page is first written to.
>
> But then what would we do if there's no space? SIGBUS? SIGSEGV?
> Inappropriate. SIGENOSPC?

Should the space be allocated on close()?
Who will get the signal if nobody accesses the file anymore?
I'm also thinking about various shell scripts with redirects to files...

Andrey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/