Re: Buffer and page cache

V. Ganesh (ganesh@veritas.com)
Tue, 2 Nov 1999 11:24:13 -0800 (PST)


> The file system has no knowledge of disk blocks, and solely uses the
> page cache.
>
> I'd like these pages to age a little before handing them over to the
> "inode disk", because the "write_one_page" function called by
> generic_file_write would incur significant latency if the inode disk is
> "real", ie. not simulated in the same system.
>
> So we have a page cache for the inodes in the file system where the
> pages become dirty - but no buffers are attached. It reminds of a
> shared mapping, but there is no vma for the pages.

so far whatever you have described looks like a network filesystem
which tries to use the page cache to do writeback caching. unfortunately
writing back is handled solely by bdflush writing dirty blocks to disk
and doesn't support this approach.

> What appears to be needed is the following - probably it's mostly
> lacking in my understanding, but I'd appreciate to be advised how to
> attack the following points:

you're right, and that's why AFAIK nfs doesn't use the pagecache but uses
it's own internal mechanisms for writeback caching.

> - - a bit to keep shrink_mmap away from the page. When the file system

this can be done, although it's really ugly. simply allocate _one_
dummy buffer_head dbh, and mark it busy somehow. now whenever you dirty
a page, set page->buffers = dbh. this will keep shrink_mmap away from it
in spite of having a refcount of 1.

> - - a bit for a struct page that indicates the page needs to be written.

actually both the above could be solved by keeping a dirty bit in page->flags.
shrink_mmap could just check if the dirty bit were set, schedule an async
write using page->inode->writepage (except that we just lost page->inode).
kswapd could check if the pte bit were dirty and update it here.
it would also help in the (admittedly uncommon) case of n processes
having shared writable mappings who have dirtied the same page. with the
current code, it would be written out n times.
in fact, there's a NOTE NOTE NOTE in try_to_swap_out which advocates this
approach.

> - - some indication of aging: we would like a pgflush daemon to walk the

yes, this would be required since shrink_mmap would work only if there was
memory pressure.
it would be simple enough to implement a pgflushd, since it could use the
same lru list used by shrink_mmap and write out dirty pages gradually,
like bdflush. however, it would use page->inode->writepage rather than
assuming that it's a disk-based filesystem.

ganesh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/