Re: Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads

From: Michael Schmitz
Date: Tue Jun 22 2021 - 16:04:23 EST

Hi Linus,

On 22/06/21 11:14 am, Linus Torvalds wrote:
On Mon, Jun 21, 2021 at 12:45 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
Looks like sys_exit() and do_group_exit() would be the two places to
do it (do_group_exit() would handle the signal case and
Maybe... I'm digging through that pile right now, will follow up when
I get a reasonably complete picture
We might have another possible way to solve this:

(a) make it the rule that everybody always saves the full (integer)
register set in pt_regs

(b) make m68k just always create that switch-stack for all system
calls (it's really not that big, I think it's like six words or

Turns out that is harder than it looked at first glance (at least for me).

All syscalls that _do_ save the switch stack are currently called through wrappers which pull the syscall arguments out of the saved pt_regs on the stack (pushing the switch stack after the SAVE_ALL saved stuff buries the syscall arguments on the stack, see comment about m68k_clone(). We'd have to push the switch stack _first_ when entering system_call to leave the syscall arguments in place, but that will require further changes to the syscall exit path (currently shared with the interrupt exit path). Not to mention the register offset calculations in arch/m68k/kernel/ptrace.c, and perhaps a few other dependencies that don't come to mind immediately.

We have both pt_regs and switch_stack in uapi/asm/ptrace.h, but the ordering of the two is only mentioned in a comment. Can we reorder them on the stack, as long as we don't change the struct definitions proper?

This will take a little more time to work out and test - certainly not before the weekend. I'll send a corrected version of my debug patch before that.



(c) admit that alpha is broken, but nobody really cares

In the meanwhile, do kernel/kthread.c uses look even remotely sane?
Intentional - sure, but it really looks wrong to use thread exit code
as communication channel there...
I really doubt that it is even "intentional".

I think it's "use some errno as a random exit code" and nobody ever
really thought about it, or thought about how that doesn't really
work. People are used to the error numbers, not thinking about how
do_exit() doesn't take an error number, but a signal number (and an
8-bit positive error code in bits 8-15).

Because no, it's not even remotely sane.

I think the do_exit(-EINTR) could be do_exit(SIGINT) and it would make
more sense. And the -ENOMEM might be SIGBUS, perhaps.

It does look like the usermode-helper code does save the exit code
with things like

kernel_wait(pid, &sub_info->retval);

and I see call_usermodehelper_exec() doing

retval = sub_info->retval;

and treating it as an error code. But I think those have never been
tested with that (bogus) exit code thing from kernel_wait(), because
it wouldn't have worked. It has only ever been tested with the (real)
exit code things like

if (pid < 0) {
sub_info->retval = pid;

which does actually assign a negative error code to it.

So I think that

kernel_wait(pid, &sub_info->retval);

line is buggy, and should be something like

int wstatus;
kernel_wait(pid, &wstatus);
sub_info->retval = WEXITSTATUS(wstatus) ? -EINVAL : 0;

or something.