On Wed, Jan 27, 2016 at 03:22:19PM -0500, Waiman Long wrote:
Yeah, I have no hardware either. But at least we should comment the bitsI think we probably won't need the acquire part, but I don't have a non-x86+ /*Here we rely on the release barrier implied by xchg() to ensure the node
+ * Put itself into the list_batch queue
+ */
+ node.next = NULL;
+ node.entry = entry;
+ node.cmd = cmd;
+ node.state = lb_state_waiting;
+
initialization is complete before the xchg() publishes the thing.
But do we also need the acquire part of this barrier? From what I could
tell, the primitive as a whole does not imply any ordering.
machine that can really test out the more relaxed versions of the atomic
ops. That is why I use the strict versions. We can always relax it later on
with additional patches.
we do know to rely upon.
READ/WRITE_ONCE() provide _no_ order what so ever. And the issue here isYou are right. I did forgot about there was no memory barrier guarantee when+ if (!next) {for what? :-)
+ /*
+ * The queue tail should equal to nptr, so clear it to
+ * mark the queue as empty.
+ */
+ if (cmpxchg(&batch->tail, nptr, NULL) != nptr) {
+ /*
+ * Queue not empty, wait until the next pointer is
+ * initialized.
+ */
+ while (!(next = READ_ONCE(nptr->next)))
+ cpu_relax();
+ }
+ /* The above cmpxchg acts as a memory barrier */
Also, if that cmpxchg() fails, it very much does _not_ act as one.
I suspect you want smp_store_release() setting the state_done, just as
above, and then use cmpxchg_relaxed().
cmpxchg() fails.
However, in that case, the READ_ONCE() and WRITE_ONCE()
macros should still provide the necessary ordering, IMO.
that we must not do any other stores to nptr after the state_done.
I can certainlyThat seems a safe combination and would still generate the exact same
change it to use cmpxchg_relaxed() and smp_store_release() instead.
code on x86.