Re: [PATCH -mm] swsusp: support creating bigger images
From: Andrew Morton
Date: Tue May 09 2006 - 03:36:19 EST
"Rafael J. Wysocki" <rjw@xxxxxxx> wrote:
>
> Currently swsusp is only capable of creating suspend images that are not
> bigger than 1/2 of the normal zone, because it needs to create a copy of every
> page which should be included in the image. This may hurt the system
> responsiveness after resume, especially on systems with less that 1 GB of RAM.
>
> To allow swsusp to create bigger images we can use the observation that
> if some pages don't change after the snapshot image of the system has been
> created and before the image is entirely saved, they can be included in the
> image without copying. Now if the mapped pages that are not mapped by the
> current task are considered, it turns out that they would change only if they
> were reclaimed by try_to_free_pages(). Thus if we take them out of reach
> of try_to_free_pages(), for example by (temporarily) moving them out of their
> respective LRU lists after creating the image, we will be able to include them
> in the image without copying.
>
> If these pages are included in the image without copying, the amount of free
> memory needed by swsusp is usually much less than otherwise, so the size
> of the image may be bigger. This also makes swsusp use less memory during
> suspend and saves us quite a lot of memory allocations and copyings.
>
I have a host of minor problems with this patch.
> --- linux-2.6.17-rc3-mm1.orig/mm/rmap.c
> +++ linux-2.6.17-rc3-mm1/mm/rmap.c
>
> +#ifdef CONFIG_SOFTWARE_SUSPEND
> +static int page_mapped_by_task(struct page *page, struct task_struct *task)
> +{
> + struct vm_area_struct *vma;
> + int ret = 0;
> +
> + spin_lock(&task->mm->page_table_lock);
> +
> + for (vma = task->mm->mmap; vma; vma = vma->vm_next)
> + if (page_address_in_vma(page, vma) != -EFAULT) {
> + ret = 1;
> + break;
> + }
> +
> + spin_unlock(&task->mm->page_table_lock);
> +
> + return ret;
> +}
task_struct.mm can sometimes be NULL. This function assumes that it will
never be NULL. That makes it a somewhat risky interface. Are we sure it
can never be NULL?
> +/**
> + * page_to_copy - determine if a page can be included in the
> + * suspend image without copying (returns false if that's the case).
> + *
> + * It is safe to include the page in the suspend image without
> + * copying if (a) it's on the LRU and (b) it's mapped by a frozen task
> + * (all tasks except for the current task should be frozen when it's
> + * called). Otherwise the page should be copied for this purpose
> + * (compound pages are avoided for simplicity).
> + */
> +
> +int page_to_copy(struct page *page)
> +{
> + if (!PageLRU(page) || PageCompound(page))
> + return 1;
> + if (page_mapped(page))
> + return page_mapped_by_task(page, current);
> +
> + return 1;
> +}
a) This is a poorly-chosen name for a globally-scoped function. There's
no indication in its name that it is a swsusp thing.
b) It hurts my brain. "determine X and return false if it's true (!)",
where "X" means "can it be included" whereas the name "page_to_copy"
implies something seemingly unrelated - some semantic caller-defined
consequence of X.
Shudder. Please try again.
> +#endif /* CONFIG_SOFTWARE_SUSPEND */
> Index: linux-2.6.17-rc3-mm1/include/linux/rmap.h
> ===================================================================
> --- linux-2.6.17-rc3-mm1.orig/include/linux/rmap.h
> +++ linux-2.6.17-rc3-mm1/include/linux/rmap.h
> @@ -104,6 +104,14 @@ pte_t *page_check_address(struct page *,
> */
> unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
>
> +#ifdef CONFIG_SOFTWARE_SUSPEND
> +/*
> + * Used to determine if the page can be included in the suspend image without
> + * copying
> + */
> +int page_to_copy(struct page *);
> +#endif
Please remove the ifdef here.
> +++ linux-2.6.17-rc3-mm1/kernel/power/snapshot.c
> --- linux-2.6.17-rc3-mm1.orig/kernel/power/snapshot.c
> +
> +unsigned int count_data_pages(unsigned int *total)
> {
> struct zone *zone;
> unsigned long zone_pfn;
> - unsigned int n = 0;
> + unsigned int n, m;
>
> + n = m = 0;
Please avoid the multiple assignment. Kernel coding style is super-simple
- no tricksy stuff.
unsigned int n = 0;
unsigned int m = 0;
See, if we put each declaration on its own line we get a little bit of room
for a comment explaining what it does. Which is pretty much compulsory if
one insists on using identifiers like `m' and `n'.
> for_each_zone (zone) {
Might as well remove the extranous space here.
> if (is_highmem(zone))
> continue;
> mark_free_pages(zone);
> - for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn)
> - n += saveable(zone, &zone_pfn);
> + for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
> + unsigned long pfn = zone_pfn + zone->zone_start_pfn;
> +
> + if (saveable(pfn)) {
> + n++;
> + if (!page_to_copy(pfn_to_page(pfn)))
> + m++;
> + }
> + }
> }
> - return n;
> + if (total)
> + *total = n;
> +
> + return n - m;
> }
See, I think `n' should be renamed to `pages_to_copy'.
I don't know what `m' should be renamed to, because that would require me
to understand `page_to_copy()', and I'm not smart enough for that.
> /**
> + * protect_data_pages - move data pages that need to be protected from
> + * being reclaimed (ie. those included in the suspend image without
> + * copying) out of their respective LRU lists. This is done after the
> + * image has been created so the LRU lists only have to be restored if
> + * the suspend fails.
> + *
> + * This function is only called from the critical section, ie. when
> + * processes are frozen, there's only one CPU online and local IRQs
> + * are disabled on it.
> + */
> +
> +static void protect_data_pages(struct pbe *pblist)
> +{
> + struct pbe *p;
> +
> + for_each_pbe (p, pblist)
extraneous space.
> -static int enough_free_mem(unsigned int nr_pages)
> +static int enough_free_mem(unsigned int nr_pages, unsigned int copy_pages)
> {
> struct zone *zone;
> - unsigned int n = 0;
> + long n = 0;
Suggest you rename `n' to something useful while you're there.
> + pr_debug("swsusp: pages needed: %u + %lu + %lu\n",
> + copy_pages,
> + (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE,
> + EXTRA_PAGES);
Use DIV_ROUND_UP() here.
> for_each_zone (zone)
> - if (!is_highmem(zone))
> + if (!is_highmem(zone) && populated_zone(zone)) {
> n += zone->free_pages;
> - pr_debug("swsusp: available memory: %u pages\n", n);
> - return n > (nr_pages + PAGES_FOR_IO +
> - (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
> -}
> -
> -static int alloc_data_pages(struct pbe *pblist, gfp_t gfp_mask, int safe_needed)
> -{
> - struct pbe *p;
> + /*
> + * We're going to use atomic allocations, so we
> + * shouldn't count the lowmem reserves in the lower
> + * zones as available to us
> + */
> + n -= zone->lowmem_reserve[ZONE_NORMAL];
> + }
>
> - for_each_pbe (p, pblist) {
> - p->address = (unsigned long)alloc_image_page(gfp_mask, safe_needed);
> - if (!p->address)
> - return -ENOMEM;
> - }
> - return 0;
> + pr_debug("swsusp: available memory: %ld pages\n", n);
> + return n > (long)(copy_pages + EXTRA_PAGES +
> + (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
DIV_ROUND_UP().
> asmlinkage int swsusp_save(void)
> {
> - unsigned int nr_pages;
> + unsigned int nr_pages, copy_pages;
unsigned int nr_pages;
unsigned int pages_to_copy;
> +#if (defined(CONFIG_BLK_DEV_INITRD) && defined(CONFIG_BLK_DEV_RAM))
> +#define EXTRA_PAGES (PAGES_FOR_IO + \
> + (2 * CONFIG_BLK_DEV_RAM_SIZE * 1024) / PAGE_SIZE)
> +#else
> +#define EXTRA_PAGES PAGES_FOR_IO
> +#endif
What is the significance of the ramdisk to the swsusp code? (Need commenting).
EXTRA_PAGES is not a well-chosen identifier. Please choose something
within the swsusp "namespace", if there's such a thing.
> ===================================================================
> --- linux-2.6.17-rc3-mm1.orig/kernel/power/swsusp.c
> +++ linux-2.6.17-rc3-mm1/kernel/power/swsusp.c
> @@ -177,28 +177,29 @@ int swsusp_shrink_memory(void)
> long size, tmp;
> struct zone *zone;
> unsigned long pages = 0;
> - unsigned int i = 0;
> + unsigned int to_save, i = 0;
unsigned int pages_to_save;
unsigned int i = 0; /* Maybe a comment */
> ===================================================================
> --- linux-2.6.17-rc3-mm1.orig/mm/vmscan.c
> +++ linux-2.6.17-rc3-mm1/mm/vmscan.c
> @@ -1437,7 +1437,68 @@ out:
>
> return ret;
> }
> -#endif
> +
> +static LIST_HEAD(saved_active_pages);
> +static LIST_HEAD(saved_inactive_pages);
> +
> +/**
> + * move_out_of_lru - software suspend includes some pages in the
> + * suspend snapshot image without copying and these pages should be
> + * procected from being reclaimed, which can be done by (temporarily)
> + * moving them out of their respective LRU lists
> + *
> + * It is to be called with local IRQs disabled.
> + */
> +
> +void move_out_of_lru(struct page *page)
> +{
> + struct zone *zone = page_zone(page);
> +
> + spin_lock(&zone->lru_lock);
Normally we'd check that PageLRU() is still true after acquiring the lock.
> + if (PageActive(page)) {
> + del_page_from_active_list(zone, page);
> + list_add(&page->lru, &saved_active_pages);
> + } else {
> + del_page_from_inactive_list(zone, page);
> + list_add(&page->lru, &saved_inactive_pages);
> + }
> + ClearPageLRU(page);
> + spin_unlock(&zone->lru_lock);
> +}
> +
> +
> +/**
> + * restore_active_inactive_lists - used by the software suspend to move
> + * the pages taken out of the LRU by take_page_out_of_lru() back to
> + * their respective active/inactive lists (if the suspend fails)
> + */
> +
> +void restore_active_inactive_lists(void)
> +{
> + struct page *page;
> + struct zone *zone;
> +
> + while(!list_empty(&saved_active_pages)) {
Add a space after `while'.
> + page = lru_to_page(&saved_active_pages);
> + zone = page_zone(page);
> + list_del(&page->lru);
> + spin_lock_irq(&zone->lru_lock);
> + SetPageLRU(page);
> + add_page_to_active_list(zone, page);
> + spin_unlock_irq(&zone->lru_lock);
> + }
> +
> + while(!list_empty(&saved_inactive_pages)) {
Ditto.
> + page = lru_to_page(&saved_inactive_pages);
> + zone = page_zone(page);
> + list_del(&page->lru);
> + spin_lock_irq(&zone->lru_lock);
> + SetPageLRU(page);
> + add_page_to_inactive_list(zone, page);
> + spin_unlock_irq(&zone->lru_lock);
> + }
> +}
> +#endif /* CONFIG_PM */
All the above code is inside CONFIG_PM, but afaik it's only used by
CONFIG_SOFTWARE_SUSPEND.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/