Re: process creation time increases linearly with shmem
From: Hugh Dickins
Date: Fri Aug 26 2005 - 13:40:43 EST
On Fri, 26 Aug 2005, Linus Torvalds wrote:
> On Fri, 26 Aug 2005, Hugh Dickins wrote:
> >
> > I see some flaws in the various patches posted, including Rik's.
> > Here's another version - doing it inside copy_page_range, so this
> > kind of vma special-casing is over in mm/ rather than kernel/.
>
> I like this approach better, but I don't understand your particular
> choice of bits.
>
> > + * Assume the fork will probably exec: don't waste time copying
> > + * ptes where a page fault will fill them correctly afterwards.
> > + */
> > + if ((vma->vm_flags & (VM_MAYSHARE|VM_HUGETLB|VM_NONLINEAR|VM_RESERVED))
> > + == VM_MAYSHARE)
> > + return 0;
> > +
> > if (is_vm_hugetlb_page(vma))
> > return copy_hugetlb_page_range(dst_mm, src_mm, vma);
>
> First off, if you just did it below the hugetlb check, you'd not need to
> check hugetlb again.
Yes: I wanted to include VM_HUGETLB in the list as documentation really;
and it costs nothing to test it along with the other flags - or are there
architectures where the more bits you test, the costlier?
> And while I understand VM_NONLINEAR and VM_RESERVED,
> can you please comment on why VM_MAYSHARE is so important, and why no
> other information matters.
The VM_MAYSHARE one isn't terribly important, there's no correctness
reason to replace VM_SHARED there. It's just that do_mmap_pgoff takes
VM_SHARED and VM_MAYWRITE off a MAP_SHARED mapping of a file which was
not opened for writing. We can safely avoid copying the ptes of such a
vma, just as with the writable ones, but the VM_MAYSHARE test catches
them where the VM_SHARED test does not.
> Now, VM_MAYSHARE is a sign of the mapping being a shared mapping. Fair
> enough. But afaik, a shared anonymous mapping absolutely needs its page
> tables copied, because those page tables contains either the pointers to
> the shared pages, or the swap entries.
>
> So I really think you need to verify that it's a file mapping too.
Either I'm misunderstanding, or you're remembering back to how shared
anonymous was done in 2.2 (perhaps). In 2.4 and 2.6, shared anonymous
is "backed" by a shared memory object, created by shmem_zero_setup:
which sets vm_file even though we came into do_mmap_pgoff with no file.
> Also, arguably, there are other cases that may or may not be worth
> worrying about. What about non-shared non-writable file mappings? What
> about private mappings that haven't been COW'ed?
Non-shared non-currently-writable file mappings might have been writable
and modified in the past, so we cannot necessarily skip those.
We could, and I did, consider testing whether the vma has an anon_vma:
we always allocate a vma's anon_vma just before first allocating it a
private page, and it's a good test which swapoff uses to narrow its
search.
But partly I thought that a little too tricksy, and hard to explain;
and partly I thought it was liable to catch the executable text,
some of which is most likely to be needed in between fork and exec.
> So I think that in addition to your tests, you should test for
> "vma->vm_file", and you could toy with testing for "vma->anon_vma" being
> NULL (the latter will cause a _lot_ of hits, because any read-only private
> mapping will trigger, but it's a good stress-test and conceptually
> interesting, even if I suspect it will kill any performance gain through
> extra minor faults in the child).
Ah yes, I wrote the paragraph above before reading this one, honest!
Well, I still don't think we need to test vm_file. We can add an
anon_vma test if you like, if we really want to minimize the fork
overhead, in favour of later faults. Do we?
Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/