Re: [PATCH] VFS: Pagecache usage optimization on pagesize!=blocksize environment

From: Andrew Morton
Date: Thu May 22 2008 - 04:04:34 EST


On Thu, 22 May 2008 16:31:15 +0900 Hisashi Hifumi <hifumi.hisashi@xxxxxxxxxxxxx> wrote:

> I modified my patch based on your comment.

Looks pretty close to me.

> I added is_partially_uptodate method to address_space_operations, and
> I registered block_is_partially_uptodate in fs/buffer.c to ext2,ext3,ext4's aops.
> Thanks.
>
> Signed-off-by :Hisashi Hifumi <hifumi.hisashi@xxxxxxxxxxxxx>
>
> diff -Nrup linux-2.6.26-rc3.org/fs/buffer.c linux-2.6.26-rc3/fs/buffer.c
> --- linux-2.6.26-rc3.org/fs/buffer.c 2008-05-19 11:35:10.000000000 +0900
> +++ linux-2.6.26-rc3/fs/buffer.c 2008-05-22 13:23:29.000000000 +0900
> @@ -2084,6 +2084,51 @@ int generic_write_end(struct file *file,
> EXPORT_SYMBOL(generic_write_end);
>
> /*
> + * block_is_partially_uptodate checks whether buffers within a page are
> + * uptodate or not.
> + *
> + * Returns true if all buffers which correspond to a file portion
> + * we want to read are uptodate.
> + */
> +int block_is_partially_uptodate(struct page *page, read_descriptor_t *desc,
> + unsigned long from)
> +{
> + struct inode *inode = page->mapping->host;
> + unsigned long block_start, block_end, blocksize;
> + unsigned long to;

These functions can get quite confusing, and the appropriate use of
types can help.

For offsets within a page I'd suggest `unsigned'. I expect that the
32-bit quantities are more efficient on at least some 64-bit machines.

For page indices use pgoff_t.

For byte-offset-within-a-file use loff_t.

For byte-range-within-a-file use size_t.

Be careful about overflows and truncation when shifting, comparing,
assigning, etc. Be sure that the code is still correct outside the
4Gbyte mark on 32-bit and outside the 4Gpage mark on 64-bit.

It doesn't hurt at all to put each variable definition on its own line
with a little comment off to the right. It helps to mention what the
variable's units are too - bytes/pages/sectors/etc.

> + struct buffer_head *bh, *head;
> + int ret = 1;
> +
> + if (!page_has_buffers(page))
> + return 0;
> +
> + blocksize = 1 << inode->i_blkbits;
> + to = from + desc->count;
> + if (to > PAGE_CACHE_SIZE)
> + to = PAGE_CACHE_SIZE;
> + if (from < blocksize && to > PAGE_CACHE_SIZE - blocksize)
> + return 0;
> +
> + head = page_buffers(page);
> +
> + for (bh = head, block_start = 0; bh != head || !block_start;
> + block_start = block_end, bh = bh->b_this_page) {
> + block_end = block_start + blocksize;
> + if (block_end <= from || block_start >= to)
> + continue;

Oh dear, you've copied one of my least favourite parts of the VFS :(
Just look at it!

Do you think it can be simplified or clarified?

> + else {
> + if (!buffer_uptodate(bh)) {
> + ret = 0;
> + break;
> + }
> + if (block_end >= to)
> + break;
> + }
> + }
> + return ret;
> +}

We'll need the EXPORT_SYMBOL() here.

> --- linux-2.6.26-rc3.org/fs/ext2/inode.c 2008-05-19 11:35:10.000000000 +0900
> +++ linux-2.6.26-rc3/fs/ext2/inode.c 2008-05-22 12:39:46.000000000 +0900
> @@ -791,6 +791,7 @@ const struct address_space_operations ex
> .direct_IO = ext2_direct_IO,
> .writepages = ext2_writepages,
> .migratepage = buffer_migrate_page,
> + .is_partially_uptodate = block_is_partially_uptodate,

Sometime we should update Documentation/Locking to reflect the new
address_space_operation.


One other thing we should think about here is the `nobh' mode which the
extX filesystems support (although I have a feeling that Nick might
have broken this ;)) We also have data=ordered, data=writeback and
data=journal to think about. This optimisation might not be
appropriate at all to data=journal mode, but I haven't looked into
that.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/