Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1
From: Will Deacon
Date: Mon Apr 06 2020 - 09:13:49 EST
[+Peter]
On Fri, Apr 03, 2020 at 07:28:36PM -0700, Linus Torvalds wrote:
> On Fri, Apr 3, 2020 at 7:02 PM Waiman Long <longman@xxxxxxxxxx> wrote:
> >
> > So in term of priority, my current thinking is
> >
> > upgrading unfair reader > unfair reader > reader/writer
> >
> > A higher priority locker will block other lockers from acquiring the lock.
>
> An alternative option might be to have readers normally be 100% normal
> (ie with fairness wrt writers), and not really introduce any special
> "unfair reader" lock.
>
> Instead, all the unfairness would come into play only when the special
> case - execve() - does it's special "lock for reading with intent to
> upgrade".
>
> But when it enters that kind of "intent to upgrade" lock state, it
> would not only block all subsequent writers, it would also guarantee
> that all other readers can continue to go).
>
> So then the new rwsem operations would be
>
> - read_with_write_intent_lock_interruptible()
>
> This is the beginning of "execve()", and waits for all writers to
> exit, and puts the lock into "all readers can go" mode.
>
> You could think of it as a "I'm queuing myself for a write lock,
> but I'm allowing readers to go ahead" state.
>
> - read_lock_to_write_upgrade()
>
> This is the "now this turns into a regular write lock". It needs to
> wait for all other readers to exit, of course.
... and at this point, subsequent readers queue behind the upgrader so we
can't run into the usual "stream of readers prevents forward progress"
issue, which was my initial worry when I started reading the thread. Makes
sense.
> - read_with_write_intent_unlock()
>
> This is the "I'm unqueuing myself, I aborted and will not become a
> write lock after all" operation.
>
> NOTE! In this model, there may be multiple threads that do that
> initial queuing thing. We only guarantee that only one of them will
> get to the actual write lock stage, and the others will abort before
> that happens.
I do worry a bit about how much of this we can enforce, but I suppose I'll
wait for the patches. For example, it would nice for
read_lock_to_write_upgrade() to return -EBUSY if there was a concurrent
(successful) upgrade rather than some pathological failure mode like
deadlock, but that feels like it might be a pain to do. It would probably
also be nice to scream if read_lock_to_write_upgrade() is called on a lock
where the upgrade *did* go ahead. Maybe some of this is food for lockdep.
That said, if this all ends up being spelled task_cred_*() then perhaps
it doesn't matter.
Will