Re: [PATCH 2/2] hugepages: Fix use after free bug in "quota" handling

From: David Gibson
Date: Thu Mar 08 2012 - 22:40:10 EST


On Wed, Mar 07, 2012 at 04:27:20PM -0800, Andrew Morton wrote:
> On Wed, 7 Mar 2012 15:48:14 +1100
> David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> > hugetlbfs_{get,put}_quota() are badly named. They don't interact with the
> > general quota handling code, and they don't much resemble its behaviour.
> > Rather than being about maintaining limits on on-disk block usage by
> > particular users, they are instead about maintaining limits on in-memory
> > page usage (including anonymous MAP_PRIVATE copied-on-write pages)
> > associated with a particular hugetlbfs filesystem instance.
> >
> > Worse, they work by having callbacks to the hugetlbfs filesystem code from
> > the low-level page handling code, in particular from free_huge_page().
> > This is a layering violation of itself, but more importantly, if the kernel
> > does a get_user_pages() on hugepages (which can happen from KVM amongst
> > others), then the free_huge_page() can be delayed until after the
> > associated inode has already been freed. If an unmount occurs at the
> > wrong time, even the hugetlbfs superblock where the "quota" limits are
> > stored may have been freed.
> >
> > Andrew Barry proposed a patch to fix this by having hugepages, instead of
> > storing a pointer to their address_space and reaching the superblock from
> > there, had the hugepages store pointers directly to the superblock, bumping
> > the reference count as appropriate to avoid it being freed. Andrew Morton
> > rejected that version, however, on the grounds that it made the existing
> > layering violation worse.
> >
> > This is a reworked version of Andrew's patch, which removes the extra, and
> > some of the existing, layering violation. It works by introducing the
> > concept of a hugepage "subpool" at the lower hugepage mm layer - that is
> > a finite logical pool of hugepages to allocate from. hugetlbfs now creates
> > a subpool for each filesystem instance with a page limit set, and a pointer
> > to the subpool gets added to each allocated hugepage, instead of the
> > address_space pointer used now. The subpool has its own lifetime and is
> > only freed once all pages in it _and_ all other references to it (i.e.
> > superblocks) are gone.
> >
> > subpools are optional - a NULL subpool pointer is taken by the code to mean
> > that no subpool limits are in effect.
> >
> > Previous discussion of this bug found in: "Fix refcounting in hugetlbfs
> > quota handling.". See: https://lkml.org/lkml/2011/8/11/28 or
> > http://marc.info/?l=linux-mm&m=126928970510627&w=1
> >
> > v2: Fixed a bug spotted by Hillf Danton, and removed the extra parameter to
> > alloc_huge_page() - since it already takes the vma, it is not necessary.
>
> Looks good - thanks for doing this.
>
> Some comments, nothing serious:

Cleanup patch addressing these comments below. I've done this as a
diff applied on top of the original patch.