Re: [RFC PATCH v2 1/2] printk-rb: add a new printk ringbuffer implementation

From: Peter Zijlstra
Date: Wed Jun 26 2019 - 18:41:27 EST


On Fri, Jun 21, 2019 at 12:23:19AM +0200, John Ogness wrote:
> Hi Peter,
>
> This is a long response, but we are getting into some fine details about
> the memory barriers (as well as battling my communication skill level).

So I'm going to reply piecewise to this... so not such long emails, but
more of them.

> On 2019-06-18, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >> +static void add_descr_list(struct prb_reserved_entry *e)
> >> +{
> >> + struct printk_ringbuffer *rb = e->rb;
> >> + struct prb_list *l = &rb->descr_list;
> >> + struct prb_descr *d = e->descr;
> >> + struct prb_descr *newest_d;
> >> + unsigned long newest_id;
> >> +
> >> + /* set as newest */
> >> + do {
> >> + /* MB5: synchronize add descr */
> >> + newest_id = smp_load_acquire(&l->newest);
> >> + newest_d = TO_DESCR(rb, newest_id);
> >> +
> >> + if (newest_id == EOL)
> >> + WRITE_ONCE(d->seq, 1);
> >> + else
> >> + WRITE_ONCE(d->seq, READ_ONCE(newest_d->seq) + 1);
> >> + /*
> >> + * MB5: synchronize add descr
> >> + *
> >> + * In particular: next written before cmpxchg
> >> + */
> >> + } while (cmpxchg_release(&l->newest, newest_id, e->id) != newest_id);
> >
> > What does this pair with? I find ->newest usage in:
>
> It is pairing with the smp_load_acquire() at the beginning of this loop
> (also labeled MB5) that is running simultaneously on another CPU. I am
> avoiding a possible situation that a new descriptor is added but the
> store of "next" from the previous descriptor is not yet visible and thus
> the cmpxchg following will fail, which is not allowed. (Note that "next"
> is set to EOL shortly before this function is called.)
>
> The litmus test for this is:
>
> P0(int *newest, int *d_next)
> {
> // set descr->next to EOL (terminates list)
> WRITE_ONCE(*d_next, 1);
>
> // set descr as newest
> smp_store_release(*newest, 1);
> }
>
> P1(int *newest, int *d_next)
> {
> int local_newest;
> int local_next;
>
> // get newest descriptor
> local_newest = smp_load_acquire(*newest);
>
> // a new descriptor is set as the newest
> // (not relevant here)
>
> // read descr->next of previous newest
> // (must be EOL!)
> local_next = READ_ONCE(*d_next);
> }
>
> exists (1:local_newest=1 /\ 1:local_next=0)

I'm having trouble connecting your P1's READ_ONCE() to the actual code.

You say that is in the same function, but I cannot find a LOAD there
that would care about the ACQUIRE.

Afaict prb_list is a list head not a list node (calling it just _list is
confusing at best).

You have a single linked list going from the tail to the head, while
adding to the head and removing from the tail. And that sounds like a
FIFO queue:

struct lqueue_head {
struct lqueue_node *head, *tail;
};

struct lqueue_node {
struct lqueue_node *next;
};

void lqueue_push(struct lqueue_head *h, struct lqueue_node *n)
{
struct lqueue_node *prev;

n->next = NULL;
/*
* xchg() implies RELEASE; and thereby ensures @n is
* complete before getting published.
*/
prev = xchg(&h->head, n);
/*
* xchg() implies ACQUIRE; and thereby ensures @tail is
* written after @head, see lqueue_pop()'s smp_rmb().
*/
if (prev)
WRITE_ONCE(prev->next, n);
else
WRITE_ONCE(h->tail, n);
}

struct lqueue_node *lqueue_pop(struct lqueue_head *h)
{
struct lqueue_node *head, *tail, *next;

do {
tail = READ_ONCE(h->tail);
/* If the list is empty, nothing to remove. */
if (!tail)
return NULL;

/*
* If we see @tail, we must then also see @head.
* Pairs with the xchg() in lqueue_push(),
* ensure no false positive on the singleton
* test below.
*/
smp_rmb();
head = READ_ONCE(h->head);

/* If there is but one item; fail to remove. */
if (head == tail)
return NULL;

next = smp_cond_load_relaxed(&tail->next, VAL);

} while (cmpxchg(h->tail, tail, next) != tail);

return tail;
}

Now, you appear to be using desc_ids instead of pointers, but since
you're not using the actual wrap value; I don't see the benefit of using
those IDs over straight pointers.

That is, unless I've overlooked some subtle ABA issue, but then, your
code doesn't seem to mention that, and I think we're good because if we
re-use an entry, it can never get back in the same location, since we
never allow an empty list (might also be fixable, haven't tought too
hard on this).

That said, the above has cmpxchg() vs WRITE_ONCE() and is therefore not
safe on a number of our architectures. We can either not care about
performance and use xchg() for the ->tail store, or use atomic_long_t
and suffer ugly casting.

But the above is, IMO, a more useful and readable abstraction. Let me
continue in another email (probably tomorrow).