On Tue, Feb 28, 2023 at 09:49:07AM +0100, Jonas Oberhauser wrote:
All true, though that 2x and 4x should be worth something.
On 2/27/2023 11:21 PM, Paul E. McKenney wrote:
On Mon, Feb 27, 2023 at 09:13:01PM +0100, Jonas Oberhauser wrote:Thanks for the link.
On 2/27/2023 8:40 PM, Andrea Parri wrote:https://www.msully.net/thesis/thesis.pdf page 128 (PDF page 141).
Maybe not as unpopular as you think... :)The LKMM doesn't believe that a control or data dependency orders a(Unpopular opinion I know,) it should drop dependencies ordering, not
plain write after a marked read. Hence in this test it thinks that P1's
store to u0 can happen before the load of x1. I don't remember why we
did it this way -- probably we just wanted to minimize the restrictions
on when plain accesses can execute. (I do remember the reason for
making address dependencies induce order; it was so RCU would work.)
The patch below will change what the LKMM believes. It eliminates the
positive outcome of the litmus test and the data race. Should it be
adopted into the memory model?
add/promote it.
Andrea
But either way IMHO it should be consistent; either take all the
dependencies that are true and add them, or drop them all.
In the latter case, RCU should change to an acquire barrier. (also, one
would have to deal with OOTA in some yet different way).
Generally my position is that unless there's a real-world benchmark with
proven performance benefits of relying on dependency ordering, one should
use an acquire barrier. I haven't yet met such a case, but maybe one of you
has...
Though this is admittedly for ARMv7 and PowerPC.
It's true that on architectures that don't have an acquire load (and have to
use a fence), the penalty will be bigger.
But the more obvious discussion would be what constitutes a real-world
benchmark : )
In my experience you can get a lot of performance benefits out of optimizing
barriers in code if all you execute is that code.
But once you embed that into a real-world application, often 90%-99% of time
spent will be in the business logic, not in the data structure.
And then the benefits suddenly disappear.
Note that a lot of barriers are a lot cheaper as well when there's no
contention.
Because of that, making optimization decisions based on microbenchmarks can
sometimes lead to a very poor "time invested" vs "total product improvement"
ratio.
The real-world examples I know of involved garbage collectors, and the
improvement was said to be a few percent system-wide. But that was a
verbal exchange, so I don't have a citation for you.