Re: [RFC] arm64: Enforce observed order for spinlock and data

From: bdegraaf
Date: Sat Oct 01 2016 - 11:46:18 EST

Next message: bdegraaf: "Re: [RFC] arm64: Enforce observed order for spinlock and data"
Previous message: Sinan Kaya: "Re: [PATCH V4 05/10] dmaengine: qcom_hidma: make pending_tre_count atomic"
Next in thread: bdegraaf: "Re: [RFC] arm64: Enforce observed order for spinlock and data"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2016-09-30 14:43, Robin Murphy wrote:

+ * so LSE needs an explicit barrier here as well. Without this, the
+ * changed contents of the area protected by the spinlock could be
+ * observed prior to the lock.

What is that last sentence supposed to mean? If the lock is free, then
surely any previous writes to the data it's protecting would have
already been observed by the release semantics of the previous unlock?
If the lock is currently held, what do we care about the state of the
data while we're still spinning on the lock itself? And if someone's
touching the data without having acquired *or* released the lock, why is
there a lock in the first place?

This seems like a very expensive way of papering over broken callers :/

Robin.

Thanks for your question.

First off, I saw no negative impact to performance as a result of introducing
these barriers running a wide variety of use cases, both for mobile and
server-class devices ranging from 4 to 24 cpus.

Yes, it does protect lockref, which observes the spinlocks in a non-conventional
way. In fact, with this code in place, the performance of Linus' test which runs
stat like crazy actually improved in the range of 1-2% (I suspect this is due to
fewer failures due to contention on the protected count lockref uses).

The lockref code can be made compliant, but not by a single load-acquire--it has
to take the lock. Turning off CONFIG_ARCH_USE_CMPXCHG_LOCKREF is the most
obvious solution as it forces lockref.c to take the lock. That, however, comes
at a very high performance cost: 30-50% on Linus' stat test on a 24-core system.
For larger systems, this performance gap will get even worse.

With the above in mind, it seems that supporting lockref's unorthodox method of
dealing with the lock is the better alternative, as it helps, rather than hurts,
arm64 performance.

Brent

Next message: bdegraaf: "Re: [RFC] arm64: Enforce observed order for spinlock and data"
Previous message: Sinan Kaya: "Re: [PATCH V4 05/10] dmaengine: qcom_hidma: make pending_tre_count atomic"
Next in thread: bdegraaf: "Re: [RFC] arm64: Enforce observed order for spinlock and data"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]