Re: [patch V3 56/64] futex: Correct the number of requeued waiters for PI

From: Thomas Gleixner
Date: Mon Aug 09 2021 - 06:52:14 EST


On Mon, Aug 09 2021 at 10:18, Thomas Gleixner wrote:
> On Sun, Aug 08 2021 at 10:05, Davidlohr Bueso wrote:
>> On Thu, 05 Aug 2021, Thomas Gleixner wrote:
>>
>>>From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>>
>>>The accounting is wrong when either the PI sanity check or the
>>>requeue PI operation fails. Adjust it in the failure path.
>>
>> Ok fortunately these accounting errors are benign considering they
>> are in error paths. This also made me wonder about the requeue PI
>> top-waiter wakeup from futex_proxy_trylock_atomic(), which is always
>> required with nr_wakers == 1. We account for it on the successful
>> case we acquired the lock on it's behalf (and thus requeue_pi_wake_futex
>> was called), but if the corresponding lookup_pi_state fails, we'll retry.
>> So, shouldn't the task_count++ only be considered when we know the
>> requeueing is next (a successful top_waiter acquiring the lock+pi state)?
>>
>> @@ -2260,7 +2260,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
>> */
>> if (ret > 0) {
>> WARN_ON(pi_state);
>> - task_count++;
>> /*
>> * If we acquired the lock, then the user space value
>> * of uaddr2 should be vpid. It cannot be changed by
>> @@ -2275,6 +2274,8 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
>> */
>> ret = lookup_pi_state(uaddr2, ret, hb2, &key2,
>> &pi_state, &exiting);
>> + if (!ret)
>> + task_count++;
>> }
>
> Yes, but if futex_proxy_trylock_atomic() succeeds and lookup_pi_state()
> fails then the user space futex value is already the VPID of the top
> waiter and a retry will then fail the futex_proxy_trylock_atomic().

Actually lookup_pi_state() cannot fail here.

If futex_proxy_trylock_atomic() takes the user space futex then there
are no waiters on futex2 and the task for which the proxy trylock
acquired futex2 in the user space storage cannot be exiting because it's
still enqueued on futex1.

That means lookup_pi_state() will take the attach_to_pi_owner() path and
that will succeed because VPID belongs to an alive task.

What's wrong in that code though is the condition further up:

if (requeue_pi && (task_count - nr_wake < nr_requeue)) {

nr_wake has to be 1 for requeue PI. For the first round task_count is 0
which means the condition is true for any value of nr_requeue >= 0.

It does not really matter because none of the code below that runs the
retry path, but it's at least confusing as hell.

Let me fix all of that muck.

Thanks,

tglx