Re: Memory leak in 2.4.27 kernel, using mmap raw packet sockets

From: Andrea Arcangeli
Date: Fri Nov 26 2004 - 18:43:13 EST


On Thu, Nov 25, 2004 at 01:02:06PM -0200, Marcelo Tosatti wrote:
> On Sun, Oct 17, 2004 at 03:39:26AM +0100, Alan Cox wrote:
> > On Gwe, 2004-10-15 at 19:23, Marcelo Tosatti wrote:
> > > I prefer doing the "if (PageReserved(page)) put_page_testzero(page)" as
> > > you propose instead of changing get_user_pages(), as there are several
> > > users which rely on its behaviour.
> > >
> > > I have applied your fix to the 2.4 BK tree.
> >
> > That isnt sufficient. Consider anything else taking a reference to the
> > page and the refcount going negative. And yes 2.6.x has this problem and
> > far worse in some ways, but it also has the mechanism to fix it.
> >
> > 2.6.x uses VM_IO as a VMA flag which tells the kernel two things
> > a) get_user_pages fails on it
> > b) core dumping of it is forbidden
> >
> > 2.6.x is missing a whole pile of these (fixed in the 2.6.9-ac tree I'm
> > putting together). I *think* remap_page_range() in 2.6.x can just set
> > VM_IO, but older kernels didn't pass the vma so all the users would need
> > fixing (OSS audio, media/video, usb audio, usb video, frame buffer
> > etc).
>
> I dont see any practical problem with 2.4.x right now.
>
> get_user_pages() wont be called on driver created VMA's with PageReserved
> pages because of the VM_IO bit which is set at remap_page_range().
>
> Its not possible to have any vma mapped by a driver without VM_IO set.
>
> But the network packet mmap was an isolated case, so I'll apply Andrea's
> fix just for safety, although I can't find any offender in the tree.
>
>
> --- memory.c 2004-10-22 15:58:28.000000000 -0200
> +++ memory.c 2004-10-28 14:32:26.585813200 -0200
> @@ -499,7 +499,7 @@
> /* FIXME: call the correct function,
> * depending on the type of the found page
> */
> - if (!pages[i])
> + if (!pages[i] || PageReserved(pages[i]))
> goto bad_page;
> page_cache_get(pages[i]);
> }

this needs to be modified to take the ZERO_PAGE(start) into account.
It's a minor detail, but we should allow that. (my 2.4-aa tree was
already checking the zeropage for other reasons, so it didn't need any
change)

in short the above fix should be modified like this:

if (!pages[i] || PageReserved(pages[i])) {
if (pages[i] != ZERO_PAGE(start))
goto bad_page;
} else
page_cache_get(pages[i]);

the zero page is guaranteed to remain reserved. the major bug is that
get_user_pages was allowing _temporarily_ reserved pages to be pinned.
__free_pages isn't checking the VM_IO, and as such get_user_pages should
be robust against its __free_pages counterpart, this is why I believe
the above fix is the right fix.

ZEROPAGE is special because:

1) it's guaranteed to never be unpinned
2) it's the only reserved page that handle_mm_fault is allowed to
istantiate for a filesystem data mapping

The VM_IO enforcment is a nice improvement on top of the above.

Checking the PageReserved bitflag is a good thing at least for the
zeropage, so we don't overflow the zeropage count, which isn't nice.

If you really want to fix it only using VM_IO, I still recommend to
apply the above patch, and to turn 'goto bad_page' into BUG().
It'd be a bad idea not to at least add the above code as a robustness
check.

Comments welcome. thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/