Re: [RFC PATCH 0/2] Introduce a way to expose the interpreted file with binfmt_misc

From: Ryan Houdek
Date: Wed Oct 11 2023 - 19:54:00 EST


On Mon, Oct 9, 2023 at 10:37 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
> On Fri, Oct 06, 2023 at 02:07:16PM +0200, David Hildenbrand wrote:
> > On 07.09.23 22:24, Guilherme G. Piccoli wrote:
> > > Currently the kernel provides a symlink to the executable binary, in the
> > > form of procfs file exe_file (/proc/self/exe_file for example). But what
> > > happens in interpreted scenarios (like binfmt_misc) is that such link
> > > always points to the *interpreter*. For cases of Linux binary emulators,
> > > like FEX [0] for example, it's then necessary to somehow mask that and
> > > emulate the true binary path.
> >
> > I'm absolutely no expert on that, but I'm wondering if, instead of modifying
> > exe_file and adding an interpreter file, you'd want to leave exe_file alone
> > and instead provide an easier way to obtain the interpreted file.
> >
> > Can you maybe describe why modifying exe_file is desired (about which
> > consumers are we worrying? ) and what exactly FEX does to handle that (how
> > does it mask that?).
> >
> > So a bit more background on the challenges without this change would be
> > appreciated.
>
> Yeah, it sounds like you're dealing with a process that examines
> /proc/self/exe_file for itself only to find the binfmt_misc interpreter
> when it was run via binfmt_misc?
>
> What actually breaks? Or rather, why does the process to examine
> exe_file? I'm just trying to see if there are other solutions here that
> would avoid creating an ambiguous interface...
>
> --
> Kees Cook

Hey there, FEX-Emu developer here. I can try and explain some of the issues.

First thing is that we should set the stage here that there is a
fundamental discrepancy
between how ELF interpreters are represented versus binfmt_misc
interpreters when it
comes to procfs exe. An ELF file today can either be static or dynamic, with the
dynamic ELF files having a program header called PT_INTERP which will tell the
kernel where its interpreter executable lives. In an x86-64 environment this
is likely to be something like /lib64/ld-linux-x86-64.so.2. Today, the Kernel
doesn't put the PT_INTERP handle into procfs exe, it instead uses the
dynamic ELF
that was originally launched.

In contrast to how this behaviour works, a binfmt_misc interpreter
file getting launched
through execve may or may not have ELF header sections. But it is left up to the
binfmt_misc handler to do whatever it may need. The kernel sets procfs
exe to the
binfmt_misc interpreter instead of the executable.

This is fundamentally the contrasting behaviour that is trying to be
improved. It seems
like the this behaviour is an oversight of the original binfmt_misc
implementation
rather than any sort of ambition to ensure there is a difference. It's
already ambiguous
that the interface changes when executing an executable through binfmt_misc.

Some simple ways applications break:
- Applications like chrome tend to relaunch themselves through execve
with `/proc/self/exe`
- Chrome does this. I think Flatpaks or AppImage applications do this?
- There are definitely more that do this that I have noticed.
- In the cover letter there was a link to Mesa, the OSS OpenGL/Vulkan
drivers using this
- This library uses this interface to find out what application is
running for applying
workarounds for application bugs. Plenty of historical
applications that use the API
badly or incorrectly and need specific driver workarounds for them.
- Some applications may use this path to open their own executable path and then
mmap back in for doing tricky memory mirroring or dynamic linking
of themselves.
- Saw some old abandoned emulator software doing this.

There's likely more uses that I haven't noticed from software using
this interface.

Onward to what FEX-Emu is and how it tries working around the issue
with a fairly naive hack.
FEX-Emu is an x86 and x86-64 CPU emulator that gets installed as a
binfmt_misc interpreter.
It then executes x86 and x86-64 ELF files on an Arm64 device as
effectively a multi-arch
capable fashion. It's lightweight in that all application processes
and threads are just
regular Arm64 processes and threads. This is similar to how qemu-user operates.

When processing system calls, FEX will intercept any call that
consumes a pathname,
it will then inspect that path name and if it is one of the ways it is
possible to access
procfs/exe then it redirects to the true x86/x86-64 executable. This
is an attempt to behave
like how if the ELF was executed without a binfmt_misc handler.

Pathnames captured in FEX-Emu today:
- /proc/self/exe
- /proc/<pid>/exe
- /proc/thread-self/exe

This is very fragile and doesn't cover the full range of how
applications could access procfs.
Applications could end up using the *at variants of syscalls with an
FD that has /proc/self/
open. They could do simple tricks like `/proc/self/../self/exe` and it
would side-step this check.
It's a game of whack-a-mole and escalating overhead to try and close
the gap purely due
to, what appears to be, an oversight in how binfmt_misc and PT_INTERP
is handled.

Hopefully this explains why this is necessary and that reducing the
differences between
how PT_INTERP and binfmt_misc are represented is desired.