Re: [PATCH] sys_remap_file_pages: fix ->vm_file accounting

From: Matt Helsley
Date: Wed Feb 06 2008 - 19:41:22 EST



On Wed, 2008-02-06 at 20:33 +0000, Hugh Dickins wrote:
> On Sun, 3 Feb 2008, Oleg Nesterov wrote:
> >
> > So I have to try to find another bug ;) Suppose that ->load_binary() does
> > a series of do_mmap(MAP_EXECUTABLE). It is possible that mmap_region() can
> > merge 2 vmas. In that case we "leak" ->num_exe_file_vmas. Unless I missed
> > something, mmap_region() should do removed_exe_file_vma() when vma_merge()
> > succeds (near fput(file)).
>
> Or there's the complementary case of a VM_EXECUTABLE vma being
> split in two, for example by an mprotect of a part of it.
>
> Sorry, Matt, I don't like your patch at all. It seems to add a fair
> amount of ugliness and unmaintainablity, all for a peculiar MVFS case

I thought that getting rid of the separate versions of proc_exe_link()
improved maintainability. Do you have any specific details on what you
think makes the code introduced by the patch unmaintainable?

> (you've tried to argue other advantages, but not always convinced!).

Yup -- looking at how the VM_EXECUTABLE flag affects the vma walk it's
clear one of my arguments was wrong. So I can't blame you for being
unconvinced by that. :)

I still think it would help any stacking filesystems that can't use the
solution adopted by unionfs.

> And I found it quite hard to see where the crucial difference comes.
> I guess it's that MVFS changes vma->vm_file in its ->mmap? Well, if

Yup.

> MVFS does that, maybe something else does that too, but precisely to
> rely on the present behaviour of /proc/pid/exe - so in fixing for
> MVFS, we'd be breaking that hypothetical other?

I'm not completely certain that I understand your point. Are you
suggesting that some hypothetical code would want to use this "quirk"
of /proc/pid/exe for a legitimate purpose?

Assuming that is your point, I thought my non-hypothetical java example
clearly demonstrated that at least one non-hypothetical program doesn't
expect the "quirk" and breaks because of it. Frankly,
given /proc/pid/exe's output in the non-stacking case, I can't see how
its output in the stacking case we're discussing could be considered
anything but buggy.

Cheers,
-Matt Helsley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/