Re: [PATCH 1/2] mm: Move page struct poisoning from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS

From: Michal Hocko
Date: Thu Sep 06 2018 - 01:38:14 EST


On Wed 05-09-18 08:32:05, Alexander Duyck wrote:
> On Tue, Sep 4, 2018 at 11:10 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >
> > On Tue 04-09-18 11:33:39, Alexander Duyck wrote:
> > > From: Alexander Duyck <alexander.h.duyck@xxxxxxxxx>
> > >
> > > On systems with a large amount of memory it can take a significant amount
> > > of time to initialize all of the page structs with the PAGE_POISON_PATTERN
> > > value. I have seen it take over 2 minutes to initialize a system with
> > > over 12GB of RAM.
> > >
> > > In order to work around the issue I had to disable CONFIG_DEBUG_VM and then
> > > the boot time returned to something much more reasonable as the
> > > arch_add_memory call completed in milliseconds versus seconds. However in
> > > doing that I had to disable all of the other VM debugging on the system.
> >
> > I agree that CONFIG_DEBUG_VM is a big hammer but the primary point of
> > this check is to catch uninitialized struct pages after the early mem
> > init rework so the intention was to make it enabled on as many systems
> > with debugging enabled as possible. DEBUG_VM is not free already so it
> > sounded like a good idea to sneak it there.
> >
> > > I did a bit of research and it seems like the only function that checks
> > > for this poison value is the PagePoisoned function, and it is only called
> > > in two spots. One is the PF_POISONED_CHECK macro that is only in use when
> > > CONFIG_DEBUG_VM_PGFLAGS is defined, and the other is as a part of the
> > > __dump_page function which is using the check to prevent a recursive
> > > failure in the event of discovering a poisoned page.
> >
> > Hmm, I have missed the dependency on CONFIG_DEBUG_VM_PGFLAGS when
> > reviewing the patch. My debugging kernel config doesn't have it enabled
> > for example. I know that Fedora configs have CONFIG_DEBUG_VM enabled
> > but I cannot find their config right now to double check for the
> > CONFIG_DEBUG_VM_PGFLAGS right now.
> >
> > I am not really sure this dependency was intentional but I strongly
> > suspect Pavel really wanted to have it DEBUG_VM scoped.
>
> So I think the idea as per the earlier discussion with Pavel is that
> by preloading it with all 1's anything that is expecting all 0's will
> blow up one way or another. We just aren't explicitly checking for the
> value, but it is still possibly going to be discovered via something
> like a GPF when we try to access an invalid pointer or counter.
>
> What I think I can do to address some of the concern is make this
> something that depends on CONFIG_DEBUG_VM and defaults to Y. That way
> for systems that are defaulting their config they should maintain the
> same behavior, however for those systems that are running a large
> amount of memory they can optionally turn off
> CONFIG_DEBUG_VM_PAGE_INIT_POISON instead of having to switch off all
> the virtual memory debugging via CONFIG_DEBUG_VM. I guess it would
> become more of a peer to CONFIG_DEBUG_VM_PGFLAGS as the poison check
> wouldn't really apply after init anyway.

So the most obvious question is, why don't you simply disable DEBUG_VM?
It is not aimed at production workloads because it adds asserts at many
places and it is quite likely to come up with performance penalty
already.

Besides that, Initializing memory to all ones is not much different to
initializing it to all zeroes which we have been doing until recently
when Pavel has removed that. So why do we need to add yet another
debugging config option. We have way too many of config options already.
--
Michal Hocko
SUSE Labs