Re: Q about pagecache data never written to disk

From: Andrew Morton
Date: Sun Sep 05 2004 - 16:04:08 EST


Andrey Savochkin <saw@xxxxxxxxxxxxx> wrote:
>
> Hi Andrew,
>
> On Sun, Sep 05, 2004 at 03:52:33AM -0700, Andrew Morton wrote:
> > Andrey Savochkin <saw@xxxxxxxxxxxxx> wrote:
> > >
> > > Let's suppose an mmap'ed (SHARED, RW) file has a hole.
> > > AFAICS, we allow to dirty the file pages without allocating the space for the
> > > hole - filemap_nopage just "reads" the page filling it with zeroes, and
> > > nothing is done about the on-disk data until writepage.
> > >
> > > So, if the page can't be written to disk (no space), the dirty data just
> > > stays in the pagecache. The data can be read or seen via mmap, but it isn't
> > > and never be on disk. The pagecache stays unsynchronized with the on-disk
> > > content forever.
> >
> > The kernel will make one attampt to write the data to disk. If that write
> > hits ENOSPC, the page is not redirtied (ie: the data can be lost).
> >
> > When that write hits ENOSPC an error flag is set in the address_space and
> > that will be returned from a subsequent msync(). The application will then
> > need to do something about it.
> >
> > If your application doesn't msync() the memory then it doesn't care about
> > its data anyway. If your application _does_ msync the pages then we
> > reliably report errors.
>
> This question came to my mind when I was thinking about journal_start in
> ext3_prepare_write and copy_from_user issue...
> Did you follow that discussion?

Yup. Chris and I have been admiring the problem for a few months now.

> In the considered scenario not only the application is not
> guaranteed anything till msync(), but all other programs doing regular read()
> may also be fooled about the file content, and this idea surprised me.
> On the other hand, after a write() other programs also see the new content
> without a guarantee that this content corresponds with what is on the disk...

No, read() will see the modified pagecache data immediately, apart from CPU
cache coherency effects.

> >
> > > Is it the intended behavior?
> > > Shouldn't we call the filesystem to fill the hole at the moment of the first
> > > write access?
> >
> > That would be a retrograde step - it would be nice to move in the other
> > direction: perform disk allocation at writeback time rather than at write()
> > time, even for regular write() data. To do that we (probably) need space
> > reservation APIs. And yes, we perhaps could reserve space in the
> > filesystem when that page is first written to.
> >
> > But then what would we do if there's no space? SIGBUS? SIGSEGV?
> > Inappropriate. SIGENOSPC?
>
> Should the space be allocated on close()?

What effect are you trying to achieve?

> Who will get the signal if nobody accesses the file anymore?

Nobody. That's the point. Plus there _is_ no signal defined for this.
Neither in Linux nor in POSIX.

> I'm also thinking about various shell scripts with redirects to files...

? I doubt that they're writing files via MAP_SHARED.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/