Re: writepage return value check in vmscan.c

From: chrisl@vmware.com
Date: Thu Oct 24 2002 - 14:15:32 EST


On Thu, Oct 24, 2002 at 08:33:27PM +0200, Andrea Arcangeli wrote:
> On Thu, Oct 24, 2002 at 10:57:18AM -0700, chrisl@vmware.com wrote:
> > if ((gfp_mask & __GFP_FS) && writepage) {
> > + unsigned long flags = page->flags;
> >
> > ClearPageDirty(page);
> > SetPageLaunder(page);
> > page_cache_get(page);
> > spin_unlock(&pagemap_lru_lock);
> >
> > - writepage(page);
> > + if (writepage(page))
> > + page->flags = flags;
> >
> > page_cache_release(page);
> >
> > spin_lock(&pagemap_lru_lock);
> > continue;
> > }
>
> side note, you should use atomic bitflag operations here or you risk to
> lose a bit set by another cpu between the read and the write. you

Thanks. I am just shooting in dark.

> basically meant SetPageDirty() if writepage fails. That is supposed to
> happen in the lowlevel layer (like in fail_writepage) but the problem
> here is that this isn't ramfs, and block_write_full_page could left
> locked in ram lots of pages if it would disallow these pages to be
> discared from the vm.

Exactly.

>
> > > A few fixes have been discussed. One way would be to allocate
> > > the space for the page when it is first faulted into reality and
> > > deliver SIGBUS if backing store for it could not be allocated.
> >
> > I am not sure how the user program handle that signal...
> >
> > >
> > > Ayup. MAP_SHARED is a crock. If you want to write to a file, use write().
> > >
> > > View MAP_SHARED as a tool by which separate processes can attach
> > > to some shared memory which is identified by the filesystem namespace.
> > > It's not a very good way of performing I/O.
> >
> > That is exactly the case for vmware ram file. VMware only use it to share
> > memory. Those are the virtual machine's memory. We don't want to write
> > it back to disk and we don't care what is left on the file system because
> > when vmware exit, we will throw the guest ram data away just like a real
> > machine power off ram will lost. We are not talking about machine using
> > flash ram :-).
> >
> > It is kswapd try to flush the data and it should take response to handle
> > the error. If it fail, one thing it should do is keep the page dirty
> > if write back fail. At least not corrupt memory like that.
> >
> > If we can deliver the error to user program that would be a plus.
> > But this need to be fix frist.
>
> as said this cannot be fixed easily in kernel, or it would be trivial to
> lockup a machine by filling the fs changing the i_size of a file and by
> marking all ram in the machine dirty in the hole, the vm must be allowed

Yes, but even now days it will able to lockup machine by doing that.

Try the test bigmm program I attach to this mail. It will simulate vmware's
memory mapping. It can easily lockup the machine even though there is
enough disk space.

See the comment at the source for parameter. basically, if you want
3 virtual machine, each have 2 process, using 1 G ram each you can do:

bigmm -i 3 -t 2 -c 1024

I run it on two 4G and 8G smp machine. Both can dead lock if I mmap
enough memory.

I haven't try it on the latest kernel yet. But last time I try it,
it works every time. I have to reset the machine. I mean ram file
create on normal file system.

But if I create it on /dev/shm, the kernel can correctly kill
some of the process and free the memory.

Prepare to reset the machine if you try that, you have been warned :-)

> to discard those pages and invaliding those posted writes. At least
> until a true solution will be available you should change vmware to
> preallocate the file, then it will work fine because you will catch the
> ENOSPC error during the preallocation. If you work on shmfs that will be
> very quick indeed.

Yes, shmfs seems to be the only choice so far.

Chris



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Oct 31 2002 - 22:00:24 EST