Re: unshare(CLONE_VM) Re: [PATCH] unshare: Use rcu_assign_pointer when setting sighand

From: Linus Torvalds
Date: Mon Mar 14 2016 - 14:36:00 EST


On Mon, Mar 14, 2016 at 6:15 AM, Julian Smith <jsmith@xxxxxxxxxxxxxxxxx> wrote:
>
> I'm looking into whether it would be possible to extend the unshare
> syscall to support the CLONE_VM flag with multi-threaded processes,
> because this would allow us at Undo to record multi-threaded user
> processes much more efficiently than at present.

At the point where you want to unsahe the VM, you should just use a
full clone() instead.

The thing is, unsharing the VM absolutely _requires_ you to also
unshare signals and some other state too (we require that thread
groups are in the same VM, for example, but also the child tid
information etc etc).

And the whole "copy VM" case is so expensive that at that point
there's no advantage to "unshare", you might as well just do a full
clone() (perhaps still sharing file descriptors and fs state).

So while I think a

unshare(CLONE_VM | CLONE_SIGHAND | CLONE_THREAD |
CLONE_CHILD_CLEARTID | CLONE_CHILD_SETTID);

might be possible from a technical standpoint, I'm not seeing the huge
advantage to users vs just doing something like

clone(new_vm_function, NULL, CLONE_VFORK | CLONE_FILES | CLONE_FS
| CLONE_PARENT..);
_exit();

(fixup details to actually work - the above is meant more as a
"something remotely like this" rather than actually equivalent)

The costs of forking and exiting a thread are almost all about just
the VM copying and tear-down, so a "unshare(CLONE_VM)" is
fundamentally not a cheap operation (and at the other range of the
spectrum: an exit of a thread where there are other sharing threads is
fundamentally quite cheap, because it just ends up decrementing
counters).

So my gut feel is that no, we really don't want unshare(CLONE_VM),
because it *is* a very complicated operation and doesn't actually
perform any better than just cloning.

And the "it is a very complicated operation" really comes not from the
fact that we can't copy the VM - we have that support already, but
because CLONE_VM really does go hand-in-hand with so many special
cases. Oleg pointed out that mm->core_waiters thing last time around,
but that just ends up being a detail: the whole VM sharing just ends
up being very central to a lot of small details..

Linus