Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1
From: Waiman Long
Date: Fri Apr 03 2020 - 16:42:01 EST
On 4/3/20 3:26 PM, Linus Torvalds wrote:
> I'm including some locking people here because of these issues, so
> that they can maybe verify my thinking.
>
> (a) our rwsem's are fair
>
> So the whole "execve takes it for reading, so now others can take
> it for reading too without deadlocks" is simply not true - if you use
> the existing rwsem.
>
> Because a concurrent (blocked) writer will then block other
> readers for fairness reasons, and holding it for reading doesn't
> guarantee that others can get it for reading.
>
> So clearly, the above doesn't even *fix* the deadlocks - unless
> we have an unfair mode (or just a special lock for just this that is
> not our standard rwsem, but a special unfair one).
>
> So I'm suggesting we use a special unfair rwsem here (we can make
> a simple spinlock-based one - it doesn't need to be as clever or
> optimized as the real rwsems are)
>
> (b) similarly, our rwsem's don't actually have a "upgrade from read
> to write", because that's also a fundamentally deadlocky operation.
>
> Again, that's true. Except execve() is special, and we know
> there's only _one_ execve() at a time that will complete, since we're
> serializing them. So for this particular use, "upgrade to write" would
> be possible without the general-case deadlock issues.
>
> (c) I didn't think things through, and even with these special
> semantics, my idea is complete garbage
>
> Ok, this may well be true.
>
> Anyway, the advantage of this (if it works) is that it would allow us
> to go back to the _really_ simple original model of just taking this
> lock for reading at the beginning of execve(), and not worrying so
> much about complex nesting or very complex rules for exactly when we
> got the lock and error handling.
>
> The final part when we actually update the credentials and dumpability
> and stuff in execve() is actually fairly simple. So the "upgrade to a
> write lock" phase doesn't worry me too much. It's the interaction
> with all the previous parts (which happen with it held just for
> reading) that tend to be the nastier ones.
>
> And ptrace_attach() really is special, and I think it would be the
> only one that really needs that write lock.
Making an unfair rwsem that prefer readers (like the original rwlock
semantics) is certainly doable. I don't think that is hard to do. I can
think of 2 possible ways to do that. We could make the unfairness
globally applies to all the readers of a rwsem by defining the fairness
state at init time. That will require keeping the state in the rwsem
structure increasing its size.
Another alternative is to add new functions like down_read_unfair() that
perform unfair read locking for its callers. That will require less code
change, but the calling functions have to make the right choice.
Cheers,
Longman