Re: [PATCH] prctl: allow prctl_set_mm_exe_file without unmapping old exe

From: David Hildenbrand
Date: Wed Aug 28 2024 - 06:06:10 EST


On 27.08.24 23:59, Theodore Dubois wrote:


As far as I can tell, the original purpose of this check was simply as
the easiest way to work with a quirk of /proc/self/exe at the time. From
the original patch[1]:

Note it allows to change /proc/$pid/exe iif there
are no VM_EXECUTABLE vmas present for current process,
simply because this feature is a special to C/R
and mm::num_exe_file_vmas become meaningless after
that.

num_exe_file_vmas was created to preserve a quirk of the original
/proc/self/exe implementation: if you unmapped all executable VMAs,
/proc/self/exe would disappear (because it worked by scanning the
address space for the first executable VMA.) Keeping the quirk after
switching to just saving the executable on the mm worked by keeping a
count of executable VMAs in num_exe_file_vmas, and zeroing exe_file when
it reached zero. You can probably see how it would have been annoying to
handle both num_exe_file_vmas and this prctl intending to change
exe_file, and it's easier to only allow changing exe_file after
num_exe_file_vmas has already gone to 0 and made exe_file null.

However, num_exe_file_vmas no longer exists[2]. This quirk was taken out
because it would save a bit in the vma flags, and it seems clear by now
that nobody was relying on it. These days you can simply update exe_file
with no interference.

Recently a use case for this prctl has come up outside of
checkpoint/restore, namely binfmt_misc based emulators such as FEX[3].
Any program that uses /proc/self/exe will, of course, expect it to point
to its own executable. But when executed through binfmt_misc, it will be
the emulator, resulting in compatibility issues. Emulators currently
have to attempting to intercept syscalls targeting /proc/self/exe to
redirect the path, and this is not possible in the general case
considering how flexible path resolution is. For more detail on this see
[3].

The above seems to me like a solid case for simply dropping the check.

Interestingly, the man page states:

"You can even type /proc/pid/exe to run another copy of the same executable that is being run by process pid."

Is that still true (with that binfmt_misc magic) once we change /proc/self/exe? Or does it with changing /proc/self/exe to point at the non-emulator exe even work as expected regarding this documentation?

It's a good question what will change if processes start setting random other stuff while they are still executing part of the original binary.

commit 4229fb1dc6843c49a14bb098719f8a696cdc44f8
Author: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxx>
Date: Wed Jul 11 14:02:11 2012 -0700

c/r: prctl: less paranoid prctl_set_mm_exe_file()

temporarily switch to checking that not other files besides the executable are still mapped.


I agree that b32dfe3771 reads like this check was added primarily " because this feature is a special to C/R and mm::num_exe_file_vmas become meaningless after that"

Reading mailing list discussions, there was a discussion regarding security risks [1] with the conclusion being "We must not trust /proc/pid/exe in anyway. An attacker can always execute another binary without calling execve()." [2].


So with that in mind, no objection. Clarifying which effect this has on what's stated in the man page would be interesting.


[1] https://lore.kernel.org/lkml/CAFLxGvxEDs=RG7tX+j6XEUx2+wEvuCGipUzh2vSp3rj15Rq6zA@xxxxxxxxxxxxxx/
[2] https://lore.kernel.org/lkml/CAFLxGvyCRAq6t8_ni+VFUVpOGJ4-iz0i=PRFEFpVJ+ZaPEb3-g@xxxxxxxxxxxxxx/


--
Cheers,

David / dhildenb