Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1
From: Jann Horn
Date: Wed Apr 29 2020 - 14:33:48 EST
On Wed, Apr 29, 2020 at 7:58 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Apr 28, 2020 at 4:36 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> >
> > On Wed, Apr 29, 2020 at 12:14 AM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > - we move check_unsafe_exec() down. As far as I can tell, there's no
> > > reason it's that early - the flags it sets aren't actually used until
> > > when we actually do that final set_creds..
> >
> > Right, we should be able to do that stuff quite a bit later than it happens now.
>
> Actually, looking at it, this looks painful for multiple reasons.
>
> The LSM_UNSAFE_xyz flags are used by security_bprm_set_creds(), which
> when I traced it through, happened much earlier than I thought. Making
> things worse, it's done by prepare_binprm(), which also potentially
> gets called from random points by the low-level binfmt handlers too.
Yeah, but all of that happens before we actually need to do anything
with the accumulated credential information from the prepare_binprm()
calls. We can probably move the unsafe calculation and a new LSM hook
into flush_old_exec(), right before de_thread().
> And we also have that odd "fs->in_exec" flag, which is used by thread
> cloning and io_uring, and I'm not sure what the exact semantics are.
The idea is to ensure that once we're through check_unsafe_exec() and
have computed our LSM_UNSAFE_* flags, another thread that's still
running must not be able to fork() off a child with CLONE_FS, because
having an fs_struct that's shared with anything other than sibling
threads (which will be killed off) is supposed to only be possible if
LSM_UNSAFE_SHARE is set. So:
If check_unsafe_exec() can match each reference in the refcount
->fs->users with a reference from a sibling thread (iow the fs_struct
is not currently shared with another task), it sets p->fs->in_exec.
If another thread tries to clone(CLONE_FS) while we're in execve(),
copy_fs() will throw -EAGAIN. And if io_uring tries to grab a
reference to the fs_struct with the intent to use it on a kernel
worker thread (which conceptually is kinda similar to the
clone(CLONE_FS) case), that also aborts.
And then at the end of execve(), we clear the ->fs->in_exec flag again.
So this should work fine as long as we ensure that we can't have two
threads from the same process going through execve concurrently. (Or
if we actually want to support that, we could make ->in_exec a counter
instead of a flag, but really, preventing concurrent execve()s from a
multithreaded process seems saner...)
> I'm _almost_ inclined to say that we should just abort the execve()
> entirely if somebody tries to attach in the middle.
>
> IOW, get rid of the locking, and replace it all just with a sequence
> count. Make execve() abort if the sequence count has changed between
> loading the original creds, and having installed the new creds.
>
> You can ptrace _over_ an execve, and you can ptrace _after_ an
> execve(), but trying to attach just as we execve() would just cause
> the execve() to fail.
>
> We could maybe make it conditional on the credentials actually having
> changed at all (set another flag in bprm_fill_uid()). So it would only
> fail for the suid exec case.
>
> Because honestly, trying to ptrace in the middle of a suid execve()
> sounds like an attack, not a useful thing.
>
> That sequence count approach would be a much simpler change.
In that model, what should happen if someone tries to attach to a
process that's in execve(), but after the point of no return in
de_thread()? "Abort" after the point of no return normally means
force_sigsegv(), right?