On Tue, 6 Apr 2010, Borislav Petkov wrote:
> > So again, it's actually that is NULL, not any of the
> > entries on the list itself.
> >
> > Now, I can see several cases for this:
> >
> > - the obvious one: anon_vma just wasn't correctly initialized, and is
> > missing a INIT_LIST_HEAD(&anon_vma->head). That's either a slab bug (we
> > don't have a whole lot of coverage of constructors), or somebody
> > allocated an anon_vma without using the anon_vma_cachep.
> I've added code to verify this and am suspend/resuming now... Wait a
> minute, Linus, you're good! :) :
> [ 873.083074] PM: Preallocating image memory...
> [ 873.254359] NULL anon_vma->, page 2182681

Yeah, I was pretty sure of that thing.

I still don't see _how_ it happens, though. That 'struct anon_vma' is very
simple, and contains literally just the lock and that list_head.

Now, '' is kind of magical, because it contains that magic
low-bit "have I been locked" thing (see "vm_lock_anon_vma()" in
mm/mmap.c). But I'm not seeing anything else touching it.

And if you allocate a anon_vma the proper way, the SLUB constructor should
have made sure that the head is initialized. And no normal list operation
ever sets any list pointer to zero, although a "list_del()" on the first
list entry could do it if that first list entry had a NULL next pointer.

> Now, how do we track back to the place which is missing anon_vma->head
> init? Can we use the struct page *page arg to page_referenced_anon()
> somehow?

You might enable SLUB debugging (both SLUB_DEBUG _and_ SLUB_DEBUG_ON), and
then make the "object_err()" function in mm/slub.c be non-static. You
could call it when you see the problem, perhaps.

Or you could just add tests to both alloc_anon_vma() and free_anon_vma()
to check that 'list_empty(&anon_vma->head)' is true. I dunno.

> > I haven't looked at the kernel config files: do they perhaps share the
> > same (odd?) SLUB/SLAB/SLOB config?
> what is an odd SL[AOU]B config?

Probably anything but the default SLUB these days. But Steinar already
said he had SLUB, so it's unlikely to be something odd.

> > - anon_vma isn't actually an anonvma at all. 'page->mapping' was crud
> > with the low bit set. That sounds unlikely, but who knows. The ksm code
> > sets mapping to "stable_node + PAGE_MAPPING_ANON | PAGE_MAPPING_KSM"
> >
> > Did people have KSM enabled?
> Nope, KSM is off here.

Yeah, wasn't for Steinar either. So it doesn't look like it's any odd
corner case that depends on some odd configuration.

