Re: [RFC PATCH 1/3] lib/list_batch: A simple list insertion/deletion batching facility
From: Peter Zijlstra
Date: Wed Jan 27 2016 - 15:54:15 EST
On Wed, Jan 27, 2016 at 03:22:19PM -0500, Waiman Long wrote:
> >>+ /*
> >>+ * Put itself into the list_batch queue
> >>+ */
> >>+ node.next = NULL;
> >>+ node.entry = entry;
> >>+ node.cmd = cmd;
> >>+ node.state = lb_state_waiting;
> >>+
> >Here we rely on the release barrier implied by xchg() to ensure the node
> >initialization is complete before the xchg() publishes the thing.
> >
> >But do we also need the acquire part of this barrier? From what I could
> >tell, the primitive as a whole does not imply any ordering.
>
> I think we probably won't need the acquire part, but I don't have a non-x86
> machine that can really test out the more relaxed versions of the atomic
> ops. That is why I use the strict versions. We can always relax it later on
> with additional patches.
Yeah, I have no hardware either. But at least we should comment the bits
we do know to rely upon.
> >>+ if (!next) {
> >>+ /*
> >>+ * The queue tail should equal to nptr, so clear it to
> >>+ * mark the queue as empty.
> >>+ */
> >>+ if (cmpxchg(&batch->tail, nptr, NULL) != nptr) {
> >>+ /*
> >>+ * Queue not empty, wait until the next pointer is
> >>+ * initialized.
> >>+ */
> >>+ while (!(next = READ_ONCE(nptr->next)))
> >>+ cpu_relax();
> >>+ }
> >>+ /* The above cmpxchg acts as a memory barrier */
> >for what? :-)
> >
> >Also, if that cmpxchg() fails, it very much does _not_ act as one.
> >
> >I suspect you want smp_store_release() setting the state_done, just as
> >above, and then use cmpxchg_relaxed().
>
> You are right. I did forgot about there was no memory barrier guarantee when
> cmpxchg() fails.
> However, in that case, the READ_ONCE() and WRITE_ONCE()
> macros should still provide the necessary ordering, IMO.
READ/WRITE_ONCE() provide _no_ order what so ever. And the issue here is
that we must not do any other stores to nptr after the state_done.
> I can certainly
> change it to use cmpxchg_relaxed() and smp_store_release() instead.
That seems a safe combination and would still generate the exact same
code on x86.