Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1
From: Linus Torvalds
Date: Fri Apr 03 2020 - 15:27:13 EST
[ For Waiman & co - the problem is that the current cred_guard_mutex
is horrendous and has problems with execve() deadlocking against
various users. We've had this bug before, there's a new one, it's just
nasty ]
On Thu, Apr 2, 2020 at 4:04 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>
> That is not the direction I intend to take either.
>
> I was hoping I could put off replying to this thread for a bit because
> I only managed to get 4 hours of sleep last night and I am not as alert
> to technical details as I would like to be.
Hmm.. So I've been looking at this cred_guard_mutex, and I wonder...
This is a bit hand-wavy, because I haven't walker through all the
paths, but could we perhaps work around a lot of the problems a
different way., namely:
- make the "cred_guard_mutex" an rwsem-like thing instead of being a mutex.
- make the ptrace_attach() case get it for writing - not because
ptrace changes the creds, but because ptrace changes 'task->ptrace'
and depends on dumpability etc.
- change the *name* of that damn thing. Not because it's now
rwsem'ish rather than a mutex, but because it was never really about
just "creds". It was about creds+ptrace+dumpable flags etc.
- make all the ones that read the creds to just take it for reading
(IOW, the cases that were basically switched over to
exec_update_mutex).
- and finally: make "execve()" take it just for reading too, but
introduce a "upgrade to write" at the very end (when it actually is
all done and then finally changes the creds and dumpability)
Wouldn't that solve all problems? We wouldn't get deadlocks wrt
execve(), simply because execve() doesn't need it to be writable, and
the things execve() does and can deadlock all only want readability.
But hear me out, because the above is fundamentally broken in a couple
of ways, so let me address that brokenness before you tell me I'm a
complete nincompoop and an idiot.
I'm including some locking people here because of these issues, so
that they can maybe verify my thinking.
(a) our rwsem's are fair
So the whole "execve takes it for reading, so now others can take
it for reading too without deadlocks" is simply not true - if you use
the existing rwsem.
Because a concurrent (blocked) writer will then block other
readers for fairness reasons, and holding it for reading doesn't
guarantee that others can get it for reading.
So clearly, the above doesn't even *fix* the deadlocks - unless
we have an unfair mode (or just a special lock for just this that is
not our standard rwsem, but a special unfair one).
So I'm suggesting we use a special unfair rwsem here (we can make
a simple spinlock-based one - it doesn't need to be as clever or
optimized as the real rwsems are)
(b) similarly, our rwsem's don't actually have a "upgrade from read
to write", because that's also a fundamentally deadlocky operation.
Again, that's true. Except execve() is special, and we know
there's only _one_ execve() at a time that will complete, since we're
serializing them. So for this particular use, "upgrade to write" would
be possible without the general-case deadlock issues.
(c) I didn't think things through, and even with these special
semantics, my idea is complete garbage
Ok, this may well be true.
Anyway, the advantage of this (if it works) is that it would allow us
to go back to the _really_ simple original model of just taking this
lock for reading at the beginning of execve(), and not worrying so
much about complex nesting or very complex rules for exactly when we
got the lock and error handling.
The final part when we actually update the credentials and dumpability
and stuff in execve() is actually fairly simple. So the "upgrade to a
write lock" phase doesn't worry me too much. It's the interaction
with all the previous parts (which happen with it held just for
reading) that tend to be the nastier ones.
And ptrace_attach() really is special, and I think it would be the
only one that really needs that write lock.
The disadvantage, of course, is that it would require that
special-case lock semantic, and I might also be missing some thing
that makes it not work anyway.
Comments? Am I just dreaming of a simpler model without my medications again?
Linus