Re: [Suspend2][] [Suspend2] Dynamically allocated pageflags

From: Nigel Cunningham
Date: Wed Jun 28 2006 - 02:13:46 EST


Hi.

On Wednesday 28 June 2006 08:15, Rafael J. Wysocki wrote:
> > > Well, having read the entire patch I think I'd do this in a different
> > > way. Namely, we can always use pfn_to_page() to get the struct page
> > > corresponding to given PFN, so we can enumerate pages from 0 to max_pfn
> > > and create a simple bitmap with one or more bits per PFN. The only
> > > difficulty I see wrt this is to make sure 0-order allocations are used
> > > for the bitmaps.
> >
> > I used to do that, but it arm pfns don't start at zero and it would leave
> > gaps/wastage in the discontig mem case.
>
> Would that be a big problem? That is, how many bits we would waste eg. on
> ARM?

It depends on the board (PHYS_PFN_OFFSET), but can be as much as 0xf0000000 (based on a quick search of include/asm-arm.

> > > > +
> > > > +#define BITNUMBER(page) (page_to_pfn(page))
> > >
> > > I think it would be cleaner to use page_to_pfn(page) verbatim instead
> > > of this.
> >
> > I was going for readability.
>
> Well, if I see BITNUMBER(page), I tend to check what it means, so I have to
> browse the code just to find out it's page_to_pfn(page). Ouch.

Heh. It turned out to be another missed cleanup (was unused), so it's gone now.

> > > > +
> > > > +/*
> > > > + * pages_for_zone(struct zone *zone)
> > > > + *
> > > > + * How many pages do we need for a bitmap for this zone?
> > > > + *
> > > > + */
> > > > +
> > > > +static int pages_for_zone(struct zone *zone)
> > > > +{
> > > > + return (zone->spanned_pages + (PAGE_SIZE << 3) - 1) >>
> > > > + (PAGE_SHIFT + 3);
> > > > +}
> > >
> > > Could you please explain the maths above?
> >
> > Sure. Page_size << 3 is the number of bits in a page. We are calculating
> > the number of pages needed to store spanned_pages bits, which is
> > (spanned_pages + bits_per_page - 1) / (bits_per_page - 1). (Rounds up).
>
> Any chance to use DIV_ROUND_UP here?

Where did you get that name from? I did find -name '*.[ch]' | xargs grep DIV_ROUND_UP and couldn't find it anywhere. Is it an -mm thing? If so, I'll keep an eye out for it getting mainlined.

> > > > +
> > > > +/*
> > > > + * page_zone_number(struct page *page)
> > > > + *
> > > > + * Which zone index does the page match?
> > > > + *
> > > > + */
> > > > +
> > > > +static int page_zone_number(struct page *page)
> > > > +{
> > > > + struct zone *zone, *zone_sought = page_zone(page);
> > > > + int zone_num = 0;
> > > > +
> > > > + for_each_zone(zone)
> > > > + if (zone == zone_sought)
> > > > + return zone_num;
> > > > + else
> > > > + zone_num++;
> > > > +
> > > > + printk("Was looking for a zone for page %p.\n", page);
> > > > + BUG_ON(1);
> > > > +
> > > > + return 0;
> > > > +}
> > >
> > > Why can't we use page_zonenum() instead of this?
> >
> > They will only be the same thing in the x86 case. If you have multiple
> > zones and some don't have any dma pages or highmem pages, this will give
> > a more compact numbering.
>
> But this is _really_ inefficient, because you call it for every page and
> for how many times? I think it should be fixed.

It's called once each for set/test/clearing bits, and once for each get_next_bit_on. So far as I recall, every time I use these flags, I iterate through the whole lot. I can see that it would be more efficient to remember the number and only update it when we know we've gone to a new zone (as get_next_bit_on does when searching), but that would require passing the zone number back and forward to the caller, and this is probably not as bad as it looks anyway because its a small routine and the data will be reasonably likely (depending on what else the caller is doing) to still be in the cpu caches.

> > > > +
> > > > +/*
> > > > + * allocate_dyn_pageflags(dyn_pageflags_t *pagemap)
> > > > + *
> > > > + * Allocate a bitmap for dynamic page flags.
> > > > + *
> > > > + */
> > > > +int allocate_dyn_pageflags(dyn_pageflags_t *pagemap)
> > > > +{
> > > > + int i, zone_num = 0;
> > > > + struct zone *zone;
> > > > +
> > > > + BUG_ON(*pagemap);
> > > > +
> > > > + *pagemap = kmalloc(sizeof(void *) * num_zones(), GFP_ATOMIC);
> > >
> > > By using kmalloc(), you use slab. I'd rather allocate entire pages.
> >
> > You might only be using 20 bytes out of the page. At resume time, I do
> > use entire pages if the kmalloc'd value needs relocating, but only if
> > that's needed.
>
> Which is not exactly straightforward. IMHO that should be simplified
> somehow.

Do you have a concrete suggestion. I can't see how it can get simpler - everything I consider would be to make it more complicated.

> Moreover, it would be nice to avoid calling page_to_zone_offset(page) for
> the second time in test/set/clear_dynpageflag below.

Fixed. I inlined dyn_pageflags_ul_ptr in each of the following functions and tidied.

> > > Please let me disagree with this comment. I think it would be more
> > > efficient if you searched the bitmap for the first zero bit and then
> > > find the page, pfn, whatever corresponding to this bit.
> >
> > Yes, it would be more efficient. I don't mind modifying it. This is a
> > different moment to the one when I wrote that :)

Optimisation added.

> > Once again,
> >
> > Thanks for your feedback. It's good to get picked up on cleanups I've
> > missed and be made to remember again and provide the justification for
> > decisions I've made.
>
> Well, unfortunately the patch doesn't look very good to me. The overall
> idea is quite reasonable, but IMHO the implementation needs fixing.

Here's the new dyn_pageflags.c...

Regards,

Nigel

/*
* lib/dyn_pageflags.c
*
* Copyright (C) 2004-2006 Nigel Cunningham <nigel@xxxxxxxxxxxx>
*
* This file is released under the GPLv2.
*
* Routines for dynamically allocating and releasing bitmaps
* used as pseudo-pageflags.
*/

#include <linux/module.h>
#include <linux/dyn_pageflags.h>
#include <linux/bootmem.h>

#define page_to_zone_offset(pg) (page_to_pfn(pg) - page_zone(pg)->zone_start_pfn)

/*
* pages_for_zone(struct zone *zone)
*
* How many pages do we need for a bitmap for this zone?
*
*/

static int pages_for_zone(struct zone *zone)
{
return (zone->spanned_pages + (PAGE_SIZE << 3) - 1) >>
(PAGE_SHIFT + 3);
}

/*
* page_zone_number(struct page *page)
*
* Which zone index does the page match?
*
*/

static int page_zone_number(struct page *page)
{
struct zone *zone, *zone_sought = page_zone(page);
int zone_num = 0;

for_each_zone(zone)
if (zone == zone_sought)
return zone_num;
else
zone_num++;

printk("Was looking for a zone for page %p.\n", page);
BUG_ON(1);

return 0;
}

/*
* dyn_pageflags_pages_per_bitmap
*
* Number of pages needed for a bitmap covering all zones.
*
*/
int dyn_pageflags_pages_per_bitmap(void)
{
int total = 0;
struct zone *zone;

for_each_zone(zone)
total += pages_for_zone(zone);

return total;
}

/*
* clear_dyn_pageflags(dyn_pageflags_t pagemap)
*
* Clear an array used to store local page flags.
*
*/

void clear_dyn_pageflags(dyn_pageflags_t pagemap)
{
int i = 0, zone_num = 0;
struct zone *zone;

BUG_ON(!pagemap);

for_each_zone(zone) {
for (i = 0; i < pages_for_zone(zone); i++)
memset((pagemap[zone_num][i]), 0, PAGE_SIZE);
zone_num++;
}
}

/*
* allocate_dyn_pageflags(dyn_pageflags_t *pagemap)
*
* Allocate a bitmap for dynamic page flags.
*
*/
int allocate_dyn_pageflags(dyn_pageflags_t *pagemap)
{
int i, zone_num = 0;
struct zone *zone;

BUG_ON(*pagemap);

*pagemap = kmalloc(sizeof(void *) * zones_in_use, GFP_ATOMIC);

if (!*pagemap)
return -ENOMEM;

for_each_zone(zone) {
int zone_pages = pages_for_zone(zone);
(*pagemap)[zone_num] = kmalloc(sizeof(void *) * zone_pages,
GFP_ATOMIC);

if (!(*pagemap)[zone_num]) {
free_dyn_pageflags(pagemap);
return -ENOMEM;
}

for (i = 0; i < zone_pages; i++) {
unsigned long address = get_zeroed_page(GFP_ATOMIC);
(*pagemap)[zone_num][i] = (unsigned long *) address;
if (!(*pagemap)[zone_num][i]) {
printk("Error. Unable to allocate memory for "
"dynamic pageflags.");
free_dyn_pageflags(pagemap);
return -ENOMEM;
}
}
zone_num++;
}

return 0;
}

/*
* free_dyn_pageflags(dyn_pageflags_t *pagemap)
*
* Free a dynamically allocated pageflags bitmap. For Suspend2 usage, we
* support data being relocated from slab to pages that don't conflict
* with the image that will be copied back. This is the reason for the
* PageSlab tests below.
*
*/
void free_dyn_pageflags(dyn_pageflags_t *pagemap)
{
int i = 0, zone_num = 0;
struct zone *zone;

if (!*pagemap)
return;

for_each_zone(zone) {
int zone_pages = pages_for_zone(zone);

if (!((*pagemap)[zone_num]))
continue;
for (i = 0; i < zone_pages; i++)
if ((*pagemap)[zone_num][i])
free_page((unsigned long) (*pagemap)[zone_num][i]);

if (PageSlab(virt_to_page((*pagemap)[zone_num])))
kfree((*pagemap)[zone_num]);
else
free_page((unsigned long) (*pagemap)[zone_num]);

zone_num++;
}

if (PageSlab(virt_to_page((*pagemap))))
kfree(*pagemap);
else
free_page((unsigned long) (*pagemap));

*pagemap = NULL;
return;
}

/*
* test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page)
*
* Is the page flagged in the given bitmap?
*
*/

int test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page)
{
int zone_pfn = page_to_zone_offset(page);
int zone_num = page_zone_number(page);
int pagenum = PAGENUMBER(zone_pfn);
int page_offset = PAGEINDEX(zone_pfn);
unsigned long *ul = ((*bitmap)[zone_num][pagenum]) + page_offset;
int bit = PAGEBIT(zone_pfn);

return test_bit(bit, ul);
}

/*
* set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page)
*
* Set the flag for the page in the given bitmap.
*
*/

void set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page)
{
int zone_pfn = page_to_zone_offset(page);
int zone_num = page_zone_number(page);
int pagenum = PAGENUMBER(zone_pfn);
int page_offset = PAGEINDEX(zone_pfn);
unsigned long *ul = ((*bitmap)[zone_num][pagenum]) + page_offset;
int bit = PAGEBIT(zone_pfn);

set_bit(bit, ul);
}

/*
* clear_dynpageflags(dyn_pageflags_t *bitmap, struct page *page)
*
* Clear the flag for the page in the given bitmap.
*
*/

void clear_dynpageflag(dyn_pageflags_t *bitmap, struct page *page)
{
int zone_pfn = page_to_zone_offset(page);
int zone_num = page_zone_number(page);
int pagenum = PAGENUMBER(zone_pfn);
int page_offset = PAGEINDEX(zone_pfn);
unsigned long *ul = ((*bitmap)[zone_num][pagenum]) + page_offset;
int bit = PAGEBIT(zone_pfn);

clear_bit(bit, ul);
}

/*
* get_next_bit_on(dyn_pageflags_t bitmap, int counter)
*
* Given a pfn (possibly -1), find the next pfn in the bitmap that
* is set. If there are no more flags set, return -1.
*
*/

int get_next_bit_on(dyn_pageflags_t bitmap, int counter)
{
struct page *page;
struct zone *zone;
unsigned long *ul = NULL;
int zone_offset, pagebit, zone_num, first;

first = (counter == -1);

if (first)
counter = first_online_pgdat()->node_zones->zone_start_pfn;

page = pfn_to_page(counter);
zone = page_zone(page);
zone_num = page_zone_number(page);
zone_offset = counter - zone->zone_start_pfn;

if (first) {
counter--;
zone_offset--;
}

do {
counter++;
zone_offset++;

if (zone_offset >= zone->spanned_pages) {
do {
zone = next_zone(zone);
if (!zone)
return -1;
zone_num++;
} while(!zone->spanned_pages);

counter = zone->zone_start_pfn;
zone_offset = 0;
}

pagebit = PAGEBIT(zone_offset);

if (!pagebit || !ul)
ul = (bitmap[zone_num][PAGENUMBER(zone_offset)]) + PAGEINDEX(zone_offset);

if (!(*ul & ~((1 << pagebit) - 1))) {
int increment = BITS_PER_LONG - pagebit - 1;
counter += increment;
zone_offset += increment;
continue;
}

} while(!test_bit(pagebit, ul));

return counter;
}


--
See http://www.suspend2.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.

Attachment: pgp00000.pgp
Description: PGP signature