Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb()
From: Mathieu Desnoyers
Date: Thu Nov 07 2013 - 22:59:37 EST
* Peter Zijlstra (peterz@xxxxxxxxxxxxx) wrote:
[...]
Hi Peter,
Looking at this simplified version of perf's ring buffer
synchronization, I get concerned about the following issue:
> /*
> * One important detail is that the kbuf part and the kbuf_writer() are
> * strictly per cpu and we can thus rely on program order for those.
> *
> * Only the userspace consumer can possibly run on another cpu, and thus we
> * need to ensure data consistency for those.
> */
>
> struct buffer {
> u64 size;
> u64 tail;
> u64 head;
> void *data;
> };
>
> struct buffer *kbuf, *ubuf;
>
> /*
> * If there's space in the buffer; store the data @buf; otherwise
> * discard it.
> */
> void kbuf_write(int sz, void *buf)
> {
> u64 tail, head, offset;
>
> do {
> tail = ACCESS_ONCE(ubuf->tail);
> offset = head = kbuf->head;
> if (CIRC_SPACE(head, tail, kbuf->size) < sz) {
> /* discard @buf */
> return;
> }
> head += sz;
> } while (local_cmpxchg(&kbuf->head, offset, head) != offset)
>
Let's suppose we have a thread executing kbuf_write(), interrupted by an
IRQ or NMI right after a successful local_cmpxchg() (space reservation
in the buffer). If the nested execution context also calls kbuf_write(),
it will therefore update ubuf->head (below) with the second reserved
space, and only after that will it return to the original thread context
and continue executing kbuf_write(), thus overwriting ubuf->head with
the prior-to-last reserved offset.
All this probably works OK most of the times, when we have an event flow
guaranteeing that a following event will fix things up, but there
appears to be a risk of losing events near the end of the trace when
those are in nested execution contexts.
Thoughts ?
Thanks,
Mathieu
> /*
> * Ensure that if we see the userspace tail (ubuf->tail) such
> * that there is space to write @buf without overwriting data
> * userspace hasn't seen yet, we won't in fact store data before
> * that read completes.
> */
>
> smp_mb(); /* A, matches with D */
>
> memcpy(kbuf->data + offset, buf, sz);
>
> /*
> * Ensure that we write all the @buf data before we update the
> * userspace visible ubuf->head pointer.
> */
> smp_wmb(); /* B, matches with C */
>
> ubuf->head = kbuf->head;
> }
>
> /*
> * Consume the buffer data and update the tail pointer to indicate to
> * kernel space there's 'free' space.
> */
> void ubuf_read(void)
> {
> u64 head, tail;
>
> tail = ACCESS_ONCE(ubuf->tail);
> head = ACCESS_ONCE(ubuf->head);
>
> /*
> * Ensure we read the buffer boundaries before the actual buffer
> * data...
> */
> smp_rmb(); /* C, matches with B */
>
> while (tail != head) {
> obj = ubuf->data + tail;
> /* process obj */
> tail += obj->size;
> tail %= ubuf->size;
> }
>
> /*
> * Ensure all data reads are complete before we issue the
> * ubuf->tail update; once that update hits, kbuf_write() can
> * observe and overwrite data.
> */
> smp_mb(); /* D, matches with A */
>
> ubuf->tail = tail;
> }
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/