Re: [PATCH -mm] swsusp: support creating bigger images
From: Rafael J. Wysocki
Date: Thu May 11 2006 - 17:19:30 EST
On Thursday 11 May 2006 17:01, Hugh Dickins wrote:
> On Thu, 11 May 2006, Rafael J. Wysocki wrote:
> > On Tuesday 09 May 2006 14:30, Hugh Dickins wrote:
> >
> > --- linux-2.6.17-rc3-mm1.orig/mm/rmap.c 2006-05-01 14:11:50.000000000 +0200
> > +++ linux-2.6.17-rc3-mm1/mm/rmap.c 2006-05-10 23:10:58.000000000 +0200
> > @@ -857,3 +857,38 @@ int try_to_unmap(struct page *page, int
> > return ret;
> > }
> >
> > +#ifdef CONFIG_SOFTWARE_SUSPEND
> > +/**
> > + * suspend_safe_page - determine if a page can be included in the
>
> suspend_safe_page sounds like it's suspending a safe page,
> and safe is unclear: suspend_knows_page_is_frozen?
>
> > + * suspend image without copying (returns true if so).
> > + *
> > + * It is safe to include the page in the suspend image without
> > + * copying if (a) it's on the LRU and (b) it's mapped by a frozen task
>
> That's not quite what the code checks:
> (b) it's mapped but not by the the current task
>
> (Actually, page_address_in_vma doesn't really say whether the page
> is mapped by the task, but -EFAULT does tell you that it can't be.)
>
> I still find its checks very obscure: am I right to think that most
> pages are "frozen" at this point, that it's very hard to determine
> which are and which are not, but there happens to be this "little"
> category of pages which you can be damn sure are frozen - and on
> most active systems, this "little" category actually covers a
> large number of pages which it's well worth avoiding copying?
Yes. Still I'm struggling to find some reliable+fast method of determining
which pages belong to this category.
> A very loose heuristic which works well enough to make a big difference.
I think so. :-)
> > + * (all tasks except for the current task should be frozen when it's
> > + * called). Otherwise the page should be copied for this purpose
> > + * (compound pages are avoided for simplicity).
> > + */
> > +
> > +int suspend_safe_page(struct page *page)
> > +{
> > + struct vm_area_struct *vma;
> > + int ret = 0;
> > +
> > + if (!PageLRU(page) || PageCompound(page))
> > + return 0;
> > +
> > + if (page_mapped(page)) {
> > + ret = 1;
> > + down_read(¤t->mm->mmap_sem);
> > + /* Return 0 if the page is mapped by the current task */
> > + for (vma = current->mm->mmap; vma; vma = vma->vm_next)
> > + if (page_address_in_vma(page, vma) != -EFAULT) {
> > + ret = 0;
> > + break;
> > + }
> > + up_read(¤t->mm->mmap_sem);
> > + }
> > +
> > + return ret;
> > +}
> > +#endif /* CONFIG_SOFTWARE_SUSPEND */
> >
> > I've left the locking, because the functions is called when we're freeing
> > memory and I'm not 100% sure it's safe to drop it.
>
> If the locking is necessary, then that down_read is liable to schedule
> and wait for mmap_sem to be released. But you have interrupts disabled?
It also needs to be called with interrupts enabled, unfortunately.
> And everything which might release mmap_sem already frozen? My guess is
> that it's a lot safer _not_ to lock here than to lock; but just how safe
> that is I'm not at all sure.
Well, me too. :-)
> > > But if it is sufficiently frozen, I'm puzzled as to why pages mapped
> > > into the current process are (potentially) unsafe, while those mapped
> > > into others are safe. If the current process can get back to messing
> > > with its mapped pages, what if it maps a page you earlier judged safe?
> >
> > The current task is forbidden to map anything at this point.
>
> Too bald a statement for me to judge (and by forbidden to map,
> do you mean forbidden to mmap, or forbidden to fault?).
It's forbiddent to refer to any filesystems at all or else it could corrupt
them.
> Perhaps I'd understand better if you explain why pages mapped into the
> current task have to be excluded? It just seems to me that if it can
> interfere with those pages, then it is liable to interfere with other
> pages too, including some mapped into other processes which you've
> already judged to be safe/frozen.
The current task may be a userland process that saves the image, ie.
calls write() to save the data. It reads the image data from the kernel,
it can compress and/or encrypt them, compute checksums etc. and
then it saves the (possibly processed) data using write() directly to a
disk partition (ie. without touching any filesystems). However it only
is allowed to (directly) modify its own memory.
This is a very special userland process which must adhere to some strict
rules (please have a look at Documentation/power/userland-swsusp.txt).
> Sorry if I'm wasting your time, forcing you to spell things out to
> someone who hasn't a clue what you're up to; but it all smells a
> little fishy to me.
You're absolutely right to do so, and it's a tricky stuff. :-)
Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/