Subpages (was: Page Colouring)

From: Daniel Phillips
Date: Mon Dec 29 2003 - 15:08:07 EST


On Sunday 28 December 2003 23:09, Linus Torvalds wrote:
> On Sun, 28 Dec 2003, William Lee Irwin III wrote:
> > Doubtful. I suspect he may be referring to pgcl (sometimes called
> > "subpages"), though Dan Phillips hasn't been involved in it. I guess
> > we'll have to wait for Linus to respond to know for sure.
>
> I didn't see the patch itself, but I spent some time talking to Daniel
> after your talk at the kernel summit. At least I _think_ it was him I was
> talking to - my memory for names and faces is basically zero.
>
> Daniel claimed to have it working back then, and that it actually shrank
> the kernel source code. The basic approach is to just make PAGE_SIZE
> larger, and handle temporary needs for smaller subpages by just
> dynamically allocating "struct page" entries for them. The size reduction
> came from getting rid of the "struct buffer_head", because it ends up
> being just another "small page".
>
> Daniel, details?

Hi Linus,

Your description is accurate. Another reason for code size shrinkage is
getting rid of the loops across buffers in the block IO library, e.g.,
block_read_full_page.

Subpages only make sense for file-backed memory, which conveniently lets the
page cache keep track of subpages. Each address_space has pages of all the
same size, which may be smaller, larger or the same as PAGE_CACHE_SIZE. The
first case, "subpages", is the interesting one.

An address_space with subpages has base pages of PAGE_CACHE_SIZE for its
"even" entries and up to N-1 dynamically allocated struct pages for the "odd"
entries where N is PAGE_CACHE_SIZE divided by the subpage size. Base pages
are normal members of mem_map. Subpages are not referenced by mem_map, but
only by the page cache. They are created by operations such as
find_or_create_page, which first creates a base page if necessary. A counter
field in the page flags of the base page keeps track of how many subpages
share a base page's physical memory; when this field goes to zero the base
page may be removed from the page cache.

Subpages always have a ->virtual field regardless of whether mem_map pages do.
This is used for virt_to_phys and to locate the base page when a subpage is
freed.

Page fault handling doesn't change much if at all, since the faulting address
is rounded down to a physical page, which will be a base page.

Most of the changes for subpages are in the buffer.c page cache operations and
are largely transparent to the VMM, though PAGE_CACHE_SHIFT becomes
mapping->page_shift, which touches a lot of files. As you noted, buffer_head
functionality can be taken over by struct page and buffers become expendible.
However it is not necessary to cross that bridge immediately; page buffer
lists continue to work though the buffer list is never longer than one.

With a little more work, subpages can be used to shrink mem_map: implement a
larger PAGE_CACHE_SIZE then use subpages to handle ABI problems. In this
case faults on subpages are possible and the fault path probably needs to
know something about it. With a larger-than-physical PAGE_CACHE_SIZE we can
finally have large buffers, though the kernel would have to be compiled for
it. Some more work to allowing mapping->page_shift to be larger than
PAGE_CACHE_SIZE would complete the process of generalizing the page size. My
impression is, this isn't too messy, most of the impact is on faulting. Bill
and others are already familiar with this I think. The work should dovetail.

I took a stab at implementing subpages some time ago in 2.4 and got it mostly
working but not quite bootable. I did find out roughly how invasive the
patch is, which is: not very, unless I've overlooked something major. I'll
get busy on a 2.6 prototype, and of course I'll listen attentively for
reasons why this plan won't work.

Regards,

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/