Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy

From: Nigel Cunningham
Date: Thu Apr 19 2007 - 05:05:55 EST


Hi Ingo.

On Thu, 2007-04-19 at 09:04 +0200, Ingo Molnar wrote:
> * Nigel Cunningham <nigel@xxxxxxxxxxxxxxxxxx> wrote:
>
> > From subsequent emails, I think you already got your answer, but just
> > in case...
> >
> > Yes, if you enabled "Replace swsusp by default" and you already had it
> > set up for getting swsusp to resume. If not, and you're using an
> > initrd/ramfs, you'll need to modify it to echo
> > > /sys/power/suspend2/do_resume after /sys and /proc are mounted but
> > prior to mounting / and so on.
>
> yeah, went with the default suggested by your patch:
>
> CONFIG_SUSPEND2_REPLACE_SWSUSP=y
>
> and it was pretty easy to set things up. I used "echo disk >
> /sys/power/state" to trigger it.
>
> In hindsight it was all pretty straightforward and suspend2 worked
> beautifully on an UP and on an SMP system i tried. So in exchange for
> suspend2 folks debugging a bug in CFS here's some suspend2 review
> feedback ;) Any plans about moving suspend2 to the upstream kernel? It
> should be pretty easy for it to co-exist with the current swsuspend
> code.

I really would like to get it into Linus' tree but Pavel doesn't want it
(obviously!) and I haven't got together enough of a case yet to convince
Andrew. I yet another here's-why-I-think-it-should-be-merged email in
the works (poor Andrew!) but there are too many other things on my plate
at the mo.

> The patch has quite some size:
>
> 89 files changed, 16452 insertions(+), 69 deletions(-)
>
> that should obviously be split up into more than a dozen sub-patches,
> and fed to lkml with the small ones first. (unless it already is split
> up?)

Right. A good portion (~2000 lines) of that is documentation.

> i cannot comment on the kernel/power/ bits (they are way too large
> anyway), other than that they look pretty clean visually, but the
> lowlevel arch and generic kernel bits look sane in detail too, sans a
> few mostly trivial cleanliness issues:
>
> +int suspend2_faulted = 0;
> +EXPORT_SYMBOL(suspend2_faulted);
>
> should be done via the pagefault notifier chain mechanism. Also, all the
> exports you added should be EXPORT_SYMBOL_GPL().

I'll look at that, but I'm not sure if it's a good idea - this is for
during the atomic copy & restore, when DEBUG_PAGEALLOC is enabled on
x86. Other things might touch memory in ways we don't want. It's only
needed for slab pages that get unmapped but not freed.

As far as the module exports go, I'm not expecting them to get merged. I
like building Suspend2 as modules (it helps speed the development
cycle), and see it as potentially useful for embedded but IMO there are
too many export symbols to make merging that code a possibility. This is
why they're all in one file rather than sprinkled through the files that
define the symbols.

> this:
>
> - ClearPageReserved(virt_to_page(addr));
> - init_page_count(virt_to_page(addr));
> + //ClearPageReserved(virt_to_page(addr));
> + //init_page_count(virt_to_page(addr));
>
> looks like there's a buglet in there still somewhere?

Yeah. When I was recently debugging, I found that cpu hotplugging is
using something marked __init which is causing the machine to
spontaneously reboot when cpus are replugged if DEBUG_PAGEALLOC is
enabled. Haven't had the time to get back to it, and also need some help
with the approach (what makes the machine reboot in this case instead of
oopsing, and how do I stop it?).

> + if(PageHighMem(page))
> + return 0;
>
> coding style.

Oh. The space missing after the if? Ok.

> + BUG_ON( test_suspend_state(SUSPEND_RUNNING) && /* Suspend2, that is */
>
> make this a WARN_ON() or a WARN_ON_ONCE() - that way you have a chance
> to even get feedback from users, instead of a 'uhm, X froze' report.
>
> +#define FREEZER_OFF 0
> +#define FREEZER_USERSPACE_FROZEN 1
> +#define FREEZER_FULLY_ON 2
>
> should be:
>
> +#define FREEZER_OFF 0
> +#define FREEZER_USERSPACE_FROZEN 1
> +#define FREEZER_FULLY_ON 2
>
> (you want your reviewers have an pleasant time reading your code :)

Ok.

> +#define NETLINK_SUSPEND2_USERUI 20 /* For suspend2's userui */
>
> IIRC userui was at the center of suspend2 merge flames, right? So you
> might want to layer it ontop a less flashy suspend2-core and thus get
> 90% of your patch upstream?

Ok. I've just separated that into it's own file/module, so that will be
straightforward to do.

> +++ linux/mm/vmscan.c
>
> the MM impact looks quite nontrivial. But i suspect this is unavoidable,
> because you zap portions of the pagecache on the way to disk, so when it
> comes back it results in a different pagecache (new lru lists, etc.),
> right?

The modifications do three things.

First, we're seeking to keep the LRU static once while we're suspending.
I originally sought to achieve that by avoiding entering the vmscan.c
logic (not as drastic as it sounds - Suspend2 is the only thing
running!). I think it was Nick who said he'd rather see it the pages
unlinked and kept safe that way, so now I do that. Oh, as part of this,
I separated out the code from shrink_inactive_list that returns isolated
pages, since relink_lru_lists uses it too.

The other part is (prior to the above) seeking to get in a situation
where we have enough memory available to do the cycle. I used to use
shrink_all_zones, but some users with lots of Highmem were finding
issues that let me to take a more per-zone based approach, hence
shrink_one_zone.

The last thing is a patch Rafael recently posted to improve kswapd
freezing.

> +++ linux/lib/dyn_pageflags.c
>
> shouldnt this be in mm/dyn_pageflags.c? Plus it would be nice to use
> some other core kernel user for this infrastructure. (but it's not a
> necessity i guess)

Yeah, I guess mm makes more sense.

> but ... again, the patch looks sane all around.

Thanks for the feedback. Although I'd like to see Suspend2 merged, I've
been feeling a bit like it's a lost cause. It's nice to get some
encouragement.

Nigel

Attachment: signature.asc
Description: This is a digitally signed message part