Evidently my questions weren't sufficiently clear; even with your answers it's
not clear to me what precise issue you're attempting to solve. I've tried to be
more specific this time.
At a high-level, can you clarify whether you're attempting to solve is:
(a) a functional correctness issue (e.g. data corruption)
(b) a performance issue
And whether this was seen in practice, or found through code inspection?
On Sat, Oct 01, 2016 at 12:11:36PM -0400, bdegraaf@xxxxxxxxxxxxxx wrote:On 2016-09-30 15:32, Mark Rutland wrote:
>On Fri, Sep 30, 2016 at 01:40:57PM -0400, Brent DeGraaf wrote:
>>+ * so LSE needs an explicit barrier here as well. Without this, the
>>+ * changed contents of the area protected by the spinlock could be
>>+ * observed prior to the lock.
>By whom? We generally expect that if data is protected by a lock, you take
>the lock before accessing it. If you expect concurrent lockless readers,
>then there's a requirement on the writer side to explicitly provide the
>ordering it requires -- spinlocks are not expected to provide that.
More details are in my response to Robin, but there is an API arm64 supports
in spinlock.h which is used by lockref to determine whether a lock is free or
not. For that code to work properly without adding these barriers, that API
needs to take the lock.
Can you please provide a concrete example of a case where things go wrong,
citing the code (or access pattern) in question? e.g. as in the commit messages
8e86f0b409a44193 ("arm64: atomics: fix use of acquire + release for
full barrier semantics)
859b7a0e89120505 ("mm/slub: fix lockups on PREEMPT && !SMP kernels")
(for the latter, I messed up the register -> var mapping in one paragraph, but
the style and reasoning is otherwise sound).
In the absence of a concrete example as above, it's very difficult to reason
about what the underlying issue is, and what a valid fix would be for said
>What pattern of accesses are made by readers and writers such that there is
Please note here I was asking specifically w.r.t. the lockref code, e.g. which
loads could see stale data, and what does the code do based on this value such
that there is a problem.
I added the barriers to the readers/writers because I do not know these are
not similarly abused. There is a lot of driver code out there, and ensuring
order is the safest way to be sure we don't get burned by something similar
to the lockref access.
Making the architecture-provided primitives overly strong harms performance and
efficiency (in general), makes the code harder to maintain and optimise in
future, and only masks the issue (which could crop up on other architectures,