Re: [GIT PULL] execve updates for v6.13-rc1

From: Kees Cook
Date: Thu Nov 21 2024 - 17:06:14 EST


On Thu, Nov 21, 2024 at 11:23:46AM -0800, Linus Torvalds wrote:
> I'm done with this discussion that apparently was brought on by people
> not knowing what the hell they were doing.

This is disrespectful. If you're frustrated you can just say so. I'm
certainly frustrated.

> For user space, comm[] is basically the fallback for when cmdline
> fails for some reason (for example, /proc/*/cmdline will be empty for
> kworkers, but there are other situations too)
>
> The reason? comm[] has *always* been much too limited for 'ps' output. ALWAYS.
>
> Yes, you can literally *force* ps to not do that (eg "ps -eo comm")
> but if you do that, you get the very limited comm[] output that nobody
> has ever wanted ps to give exactly because it's so limited.

I think I finally figured out why you keep saying this. I think you mean
to imply "ps -e" (or similar), not "ps". Asking for more process details
("ps a", "ps -f", "ps -e", etc) uses cmdline. Without options that turn
on more details, the default is comm. If you run just "ps", it shows comm:

$ strace ps 2>&1 | grep /cmdline | wc -l
0

If you run with detail options it shows cmdline:

$ strace ps a 2>&1 | grep /cmdline | wc -l
1266
$ strace ps -f 2>&1 | grep /cmdline | wc -l
1266

This is procps-ng found on all Debian and Ubuntu systems. AFAICT
procps-ng is used on Fedora too.

Note I'm not saying comm is GOOD or anything. I'm just saying that it
IS regularly visible.

> And yes, 'top' will give comm[] output because it's so much faster.

Exactly. By default, top and ps both show comm, which in the vast
majority of cases is identical to argv[0]. I don't understand why this
is causing such angst -- it's just the observable facts: many things in
userspace expose comm and many also expose cmdline. Having them be
mismatched due to fexecve() is the whole issue.

Nobody blinks at:

$ ps
PID TTY TIME CMD
4125309 pts/1 00:00:47 bash
4171960 pts/1 00:00:00 bash
4171962 pts/1 00:00:00 vim
4171997 pts/1 00:00:00 ps

vs

$ ps -f
UID PID PPID C STIME TTY TIME CMD
kees 4125309 3947 0 Jul11 pts/1 00:00:47 -bash
kees 4171960 4125309 0 13:30 pts/1 00:00:00 -bash
kees 4171962 4171960 0 13:30 pts/1 00:00:00 vim
kees 4172004 4125309 0 13:30 pts/1 00:00:00 ps -f

But if fexecve() were used now, "ps" would show:

$ ps
PID TTY TIME CMD
4125309 pts/1 00:00:47 3
4171960 pts/1 00:00:00 3
4171962 pts/1 00:00:00 3
4171997 pts/1 00:00:00 3

This is obviously horrible.

Using f_path, we'd get close, but symlink destinations (or weird stuff
like "memfd:name-here") are shown:

$ realpath $(which vim)
/usr/bin/vim.basic

$ ps
PID TTY TIME CMD
4125309 pts/1 00:00:47 bash
4171960 pts/1 00:00:00 bash
4171962 pts/1 00:00:00 vim.basic
4171997 pts/1 00:00:00 ps

Using argv[0], we'd get the original output:

$ ps
PID TTY TIME CMD
4125309 pts/1 00:00:47 bash
4171960 pts/1 00:00:00 bash
4171962 pts/1 00:00:00 vim
4171997 pts/1 00:00:00 ps

IMO, switching to fexecve() shouldn't regress this basic piece of
information.

Now, I think we have three choices:

1) Leave it as-is. (comm is useless)

2) Use argv[0]. (comm matches what would show with execve() in most cases)

3) Use f_path (comm exposes f_path dentry name, which matches
basename(readlink(/proc/*/exe)), but doesn't always match what execve()
would show).

I think everyone agrees "1" should go away.

So it comes down to trying to stay looking more like execve()'s comm, or
more like /proc/*/exe's value.

Since comm is mutable anyway, I feel like the "friendlier" default for
userspace would be option 2.

If you still conclude differently, I guess the discussion is over, and
we go with 3:

diff --git a/fs/exec.c b/fs/exec.c
index e0435b31a811..8688bbbaf4af 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1347,7 +1347,21 @@ int begin_new_exec(struct linux_binprm * bprm)
set_dumpable(current->mm, SUID_DUMP_USER);

perf_event_exec();
- __set_task_comm(me, kbasename(bprm->filename), true);
+
+ /*
+ * If fdpath was set, alloc_bprm() made up a path that will
+ * probably not be useful to admins running ps or similar.
+ * Let's fix it up to be something reasonable.
+ */
+ if (bprm->fdpath) {
+ rcu_read_lock();
+ __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
+ true);
+ rcu_read_unlock();
+ }
+ else {
+ __set_task_comm(me, kbasename(bprm->filename), true);
+ }

/* An exec changes our domain. We are no longer part of the thread
group */


I've minimally tested this.

-Kees

--
Kees Cook