Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/

From: Stefan Metzmacher
Date: Thu Mar 25 2021 - 17:49:40 EST



Am 25.03.21 um 22:20 schrieb Stefan Metzmacher:
>
> Am 25.03.21 um 21:55 schrieb Eric W. Biederman:
>> Oleg Nesterov <oleg@xxxxxxxxxx> writes:
>>
>>> On 03/25, Linus Torvalds wrote:
>>>>
>>>> The whole "signals are very special for IO threads" thing has caused
>>>> so many problems, that maybe the solution is simply to _not_ make them
>>>> special?
>>>
>>> Or may be IO threads should not abuse CLONE_THREAD?
>>>
>>> Why does create_io_thread() abuse CLONE_THREAD ?
>>>
>>> One reason (I think) is that this implies SIGKILL when the process exits/execs,
>>> anything else?
>>
>> A lot.
>>
>> The io workers perform work on behave of the ordinary userspace threads.
>> Some of that work is opening files. For things like rlimits to work
>> properly you need to share the signal_struct. But odds are if you find
>> anything in signal_struct (not counting signals) there will be an
>> io_uring code path that can exercise it as io_uring can traverse the
>> filesystem, open files and read/write files. So io_uring can exercise
>> all of proc.
>>
>> Using create_io_thread with CLONE_THREAD is the least problematic way
>> (including all of the signal and ptrace problems we are looking at right
>> now) to implement the io worker threads.
>>
>> They _really_ are threads of the process that just never execute any
>> code in userspace.
>
> So they should look like a userspace thread sitting in something like
> epoll_pwait() with all signals blocked, which will never return to userspace again?

Would gdb work with that?
The question is what backtrace gdb would show for that thread.

Is it possible to block SIGSTOP/SIGCONT?

I also think that all signals to an iothread should not be delivered to
other threads and it may only react on a direct SIGSTOP/SIGCONT.
I guess even SIGKILL should be ignored as the shutdown should happen
via the exit path of the iothread parent only.

> I think that would be useful, but I also think that userspace should see:
> - /proc/$tidofiothread/cmdline as empty (in order to let ps and top use [iou-wrk-$tidofuserspacethread])
> - /proc/$tidofiothread/exe as symlink to that not exists
> - all of /proc/$tidofiothread/ shows root.root as owner and group
> and things which still allow write access to /proc/$tidofiothread/comm similar things
> with rw permissions should still disallow modifications:
>
> For the other kernel threads e.g. "[cryptd]" I see the following:
>
> LANG=C ls -l /proc/653 | grep rw
> ls: cannot read symbolic link '/proc/653/exe': No such file or directory
> -rw-r--r-- 1 root root 0 Mar 25 22:09 autogroup
> -rw-r--r-- 1 root root 0 Mar 25 22:09 comm
> -rw-r--r-- 1 root root 0 Mar 25 22:09 coredump_filter
> lrwxrwxrwx 1 root root 0 Mar 25 22:09 cwd -> /
> lrwxrwxrwx 1 root root 0 Mar 25 22:09 exe
> -rw-r--r-- 1 root root 0 Mar 25 22:09 gid_map
> -rw-r--r-- 1 root root 0 Mar 25 22:09 loginuid
> -rw------- 1 root root 0 Mar 25 22:09 mem
> -rw-r--r-- 1 root root 0 Mar 25 22:09 oom_adj
> -rw-r--r-- 1 root root 0 Mar 25 22:09 oom_score_adj
> -rw-r--r-- 1 root root 0 Mar 25 22:09 projid_map
> lrwxrwxrwx 1 root root 0 Mar 25 22:09 root -> /
> -rw-r--r-- 1 root root 0 Mar 25 22:09 sched
> -rw-r--r-- 1 root root 0 Mar 25 22:09 setgroups
> -rw-r--r-- 1 root root 0 Mar 25 22:09 timens_offsets
> -rw-rw-rw- 1 root root 0 Mar 25 22:09 timerslack_ns
> -rw-r--r-- 1 root root 0 Mar 25 22:09 uid_map
>
> And this:
>
> LANG=C echo "bla" > /proc/653/comm
> -bash: echo: write error: Invalid argument
>
> LANG=C echo "bla" > /proc/653/gid_map
> -bash: echo: write error: Operation not permitted
>
> Can't we do the same for iothreads regarding /proc?
> Just make things read only there and empty "cmdline"/"exe"?
>
> Maybe I'm too naive, but that what I'd assume as a userspace developer/admin.
>
> Does at least parts of it make any sense?

I think the strange glibc setuid() behavior should also be tests here,
I guess we don't want that to reset the credentials of an iothread!

Another idea would be to have the iothreads as a child process with it's threads,
but again I'm only looking as an admin to what I'd except to see under /proc
via ps and top.

metze