RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up

From: Zhu Jefferry
Date: Tue Sep 15 2015 - 20:17:21 EST


Thanks for your detail guideline and explanations. Please see my questions in-line.

> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@xxxxxxxxxxxxx]
> Sent: Wednesday, September 16, 2015 8:01 AM
> To: Zhu Shuangjun-R65879
> Cc: linux-kernel@xxxxxxxxxxxxxxx; bigeasy@xxxxxxxxxxxxx
> Subject: RE: [PATCH v2] futex: lower the lock contention on the HB lock
> during wake up
>
> On Tue, 15 Sep 2015, Zhu Jefferry wrote:
>
> Please configure your e-mail client proper and follow the basic rules:
>
> - Choose a meaningful subject for your questions
>
> You just copied a random subject line from some other mail thread,
> which makes your mail look like a patch. But it's not a patch. You
> have a question about futexes and patches related to them.
>
> - Make sure your lines are no longer than 72 to 76 characters in length.
>
> You can't assume that all e-mail and news clients behave like yours,
> and while yours might wrap lines automatically when the text reaches the
> right of the window containing it, not all do.
>
> For the sentence above I need a 190 character wide display ....
>
> - Do not use colors or other gimmicks. They just make the mail
> unreadable in simple text based readers.
>
> > Just in the list, I see the patch "[PATCH v2] futex: lower the lock
> > contention on the HB lock during wake up" at
> > http://www.gossamer-
> threads.com/lists/linux/kernel/2199938?search_string=futex;#2199938.
>
> > But I see another patch with same name, different content here,
> >
> > 23b7776290b10297fe2cae0fb5f166a4f2c68121(http://code.metager.de/source
> > /xref/linux/stable/kernel/futex.c?r=23b7776290b10297fe2cae0fb5f166a4f2
> > c68121)
>
> I have no idea what that metager thing tells you and I really don't want
> to know. Plain git tells me:
>
> # git show 23b7776290b10297fe2cae0fb5f166a4f2c68121
> Merge: 6bc4c3ad3619 6fab54101923
> Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx
> Date: Mon Jun 22 15:52:04 2015 -0700
>
> Merge branch 'sched-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>
> So that's a merge commit where Linus pulled a pile of changes into the
> mainline kernel. And that merge does not contain the patch above, but it
> contains a different change to the futex code.
>
> > Could you please help to give a little bit more explanation on this,
> > why they have same name with different modify in the futex.c? I'm a
> > newbie in the community.
>
> Use the proper tools and not some random web interface. The commit you
> are looking for is a completely different one.
>
> # git log kernel/futex.c
> ....
> commit 802ab58da74bb49ab348d2872190ef26ddc1a3e0
> Author: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx
> Date: Wed Jun 17 10:33:50 2015 +0200
>
> futex: Lower the lock contention on the HB lock during wake up ....
>
> And that's the same as the one in the LKML thread plus a fixup.
>
> > Actually, I encounter a customer issue which is related to the glibc
> > code "pthread_mutex_lock", which is using the futex service in kernel,
> > without the patches above.
>
> The patches above are merily an optimization and completely unrelated to
> your problem.
>
> You fail to provide the real interesting information here:
>
> - Which architecture/SoC
> - Which kernel version and which extra patches
> - Which glibc version and which extra patches
>
> > After lots of customer discussing, ( I could not reproduce the failure
> > in my office), I seriously suspect there might be some particular
> > corner cases in the futex code.
>
> The futex code is more or less a conglomorate of corner cases.
>
> But again you fail to provide the real interesting information:
>
> - What is the actual failure ?
>
> The information that you discussed that with your customer is completely
> irrelevant and your suspicion does not clarify the issue either.
>
> > In the unlock flow, the user space code (pthread_mutex_unlock) will
> > check FUTEX_WAITERS flag first, then wakeup the waiters in the kernel
> > list. But in the lock flow, the kernel code (futex) will set
> > FUTEX_WAITERS in first too, then try to get the waiter from the list.
> > They are following same sequence, flag first, entry in list secondly.
> > But there might be some timing problem in SMP system, if the query
> > (unlock flow) is executing just before the list adding action (lock
> > flow).
>
> There might be some timing problem, if the code would look like the
> scheme you painted below, but it does not.
>
> > It might cause the mutex is never really released, and other threads
> > will infinite waiting. Could you please help to take a look at it?
> >
> > CPU 0 (trhead 0) CPU 1 (thread 1)
> >
> > mutex_lock
> > val = *futex;
> > sys_futex(LOCK_PI, futex, val);
> >
> > return to user space
>
> If the futex is uncontended then you don't enter the kernel for acquiring
> the futex.
>
> > after acquire the lock mutex_lock
> > val = *futex;
> > sys_futex(LOCK_PI,
> > futex, val);
>
> The futex FUTEX_LOCK_PI operation does not take the user space value.
> That's what FUTEX_WAIT does.
>
> >
> lock(hash_bucket(futex));
> > set FUTEX_WAITERS
> flag
> >
> > unlock(hash_bucket(futex)) and retry due to page fault
>
> So here you are completely off the track. If the 'set FUTEX_WAITERS bit'
> operation fails due to a page fault, then the FUTEX_WAITERS bit is not
> set. So it cannot be observed on the other core.
>
> The flow is:
>
> sys_futex(LOCK_PI, futex, ...)
>
> retry:
> lock(hb(futex));
> ret = set_waiter_bit(futex);
> if (ret == -EFAULT) {
> unlock(hb(futex));
> handle_fault();
> goto retry;
> }
>
> list_add();
> unlock(hb(futex));
> schedule();
>
> So when set_waiter_bit() succeeds, then the hash bucket lock is held and
> blocks the waker. So it's guaranteed that the waker will see the waiter
> on the list.
>
> If set_waiter_bit() faults, then the waiter bit is not set and therefor
> there is nothing to wake. So the waker will not enter the kernel because
> the futex is uncontended.
>

I assume your pseudo code set_waiter_bit is mapped to the real code "futex_lock_pi_atomic",
It's possible for futex_lock_pi_atomic to successfully set FUTEX_WAITERS bit, but return with
Page fault, for example, like fail in lookup_pi_state().


> So now, lets assume that the waiter failed to set the waiter bit and the
> waker unlocked the futex. When the waiter retries then it actually checks
> whether the futex still has an owner. So it observes the owner has been
> cleared, it acquires the futex and returns.
>
> It's a bit more complex than that due to handling of the gazillion of
> corner cases, but that's the basic synchronization mechanism and there is
> no hidden timing issue on SMP.
>
> Random speculation is not helping here.
>
> Thanks,
>
> tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/