Re: [PATCH 12/17] mm: sys_remap_anon_pages

From: Andrea Arcangeli
Date: Mon Oct 06 2014 - 13:01:14 EST


Hi,

On Sat, Oct 04, 2014 at 06:13:27AM -0700, Andi Kleen wrote:
> Andrea Arcangeli <aarcange@xxxxxxxxxx> writes:
>
> > This new syscall will move anon pages across vmas, atomically and
> > without touching the vmas.
> >
> > It only works on non shared anonymous pages because those can be
> > relocated without generating non linear anon_vmas in the rmap code.
>
> ...
>
> > It is an alternative to mremap.
>
> Why a new syscall? Couldn't mremap do this transparently?

The difference between remap_anon_pages and mremap is that mremap
fundamentally moves vmas and not pages (just the pages are moved too
because they remain attached to their respective vmas), while
remap_anon_pages move anonymous pages zerocopy across vmas but it
would never touch any vma.

mremap for example would also nuke the source vma, remap_anon_pages
just moves the pages inside the vmas instead so it doesn't require to
allocate new vmas in the area that receives the data.

We could certainly change mremap to try to detect when page_mapping of
anonymous page is 1 and downgrade the mmap_sem to down_read and then
behave like remap_anon_pages internally by updating the page->index if
all pages in the range can be updated. However to provide the same
strict checks that remap_anon_pages does and to leave the source vma
intact, mremap would need new flags that would need to alter the
normal mremap semantics that silently wipes out the destination range
and get rid of the source range and it would require to run a
remap_anon_pages-detection-routine that isn't zero cost.

Unless we add even more flags to mremap, we wouldn't have the absolute
guarantee that the vma tree is not altered in case userland is not
doing all things right (like if userland forgot MADV_DONTFORK).

Separating the two looked better, mremap was never meant to be
efficient at moving 1 page at time (or 1 THP at time).

Embedding remap_anon_pages inside mremap didn't look worthwhile
considering that as result, mremap would run slower when it cannot
behave like remap_anon_pages and it would also run slower than
remap_anon_pages when it could.

Thanks,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/