LT> Note that what I really wanted to use PG_dirty for was not for normal
LT> page-outs, but for shared mappings of files.
LT> For normal page-out activity, the PG_dirty thing isn't a win, simply
LT> because (a) it doesn't actually buy us anything (we might as well do it
LT> from the page tables directly)
If we combine it with early scanning of the page tables, it should allow us
to perform I/O before we need the memory. While not bogging down processes.
LT> and (b) as you noticed, it increases fragmentation.
This is only because he didn't implement any kind of request queue.
A fifo queue of pages to write would have keep performance up at current levels.
LT> The reason PG_dirty should be a win for shared mappings is: (a) it gets
LT> rid of the file write semaphore problem in a very clean way and
Nope. Because we can still have some try to write to file X.
That write needs memory, and we try to swapout a mapping of file X.
Unless you believe it implies the write outs then must use a seperate process.
LT> (b) it
LT> reduces the number of IO requests for mappings that are writable for
LT> multiple contexts (right now we will actually do multiple page-outs, one
LT> for each shared mapping that has dirtied the page).
With a reasonable proximity in time even that isn't necessary because of the buffer
cache. This is only a real issue for filesystems that don't implement good
write cachine on their own. In which case the primary thing to fix is
caching for those filesystems. We can use PG_dirty for that case too, but
the emphasis is different.
LT> I looked at the problem, and PG_dirty for shared mappings should be
LT> reasonably simple. However, I don't think I can do it for 2.2 simply
LT> because it involves some VFS interface changes (it requires that you can
LT> use the pame_map[] information and nothing else to page out: we have the
LT> inode and the offset which actually is enough data to do it, but we don't
LT> have a good enough "inode->i_op->writepage()" setup yet).
At least in part because for NFS you need who is doing the write, which
we currently store with a struct file. To get it correct we really
need a two level setup. Level 1 per pte to enter in the write request.
Level 2 per page to acutally write it out.
If there are conflicts, (not the same user, etc) Level 1 can handle them.
If I have any luck I should have a draft of a complete set of changes to do
all of this for 2.3 in about a month. I have been working on this in the
backgroud for quite a while.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/