Re: [RFC PATCH v2 3/3] restartable sequences: basic self-tests

From: Peter Zijlstra
Date: Wed Apr 06 2016 - 03:43:27 EST


On Tue, Apr 05, 2016 at 08:33:27PM +0000, Mathieu Desnoyers wrote:

> A problematic execution sequence would be
>
> * Exhibit A: ABA (all threads running on same CPU):
>
> Initial state: the list has a single entry "object Z"
>
> Thread A Thread B
> - percpu_list_pop()
> - cpu = rseq_current_cpu();
> - head = list->heads[cpu];
> (head is a pointer to object Z)
> - next = head->next;
> (preempted)
> (scheduled in)
> - percpu_list_pop()
> - cpu = rseq_current_cpu();
> - head = list->heads[cpu];
> (head is a pointer to object Z)
> - rseq_percpu_cmpxchgcheck succeeds
> - percpu_list_push of a new object Y
> - percpu_list_push of a re-used object Z
> (its next pointer now points to object Y
> rather than end of list)
> (preempted)
> (scheduled in)
> - rseq_percpu_cmpxchgcheck succeeds,
> setting a wrong value into the list
> head: it will store an end of list,
> thus skipping over object Y.

OK, so I'm still trying to wake up, but I'm not seeing how
rseq_percpu_cmpxchgcheck() would succeed in this case.

If you look at the code, the 'check' part would fail, that is:

> +struct percpu_list_node *percpu_list_pop(struct percpu_list *list)
> +{
> + int cpu;
> + struct percpu_list_node *head, *next;
> +
> + do {
> + cpu = rseq_current_cpu();
> + head = list->heads[cpu];
> + /*
> + * Unlike a traditional lock-less linked list; the availability
> + * of a cmpxchg-check primitive allows us to implement pop
> + * without concerns over ABA-type races.
> + */
> + if (!head) return 0;
> + next = head->next;
> + } while (cpu != rseq_percpu_cmpxchgcheck(cpu,
> + (intptr_t *)&list->heads[cpu], (intptr_t)head, (intptr_t)next,
> + (intptr_t *)&head->next, (intptr_t)next));

The extra compare is 'head->next == next', and our thread-A will have
@next == NULL (EOL), while the state after thread-B ran would be
@head->next = &Y.

So the check will fail, the cmpxchg will fail, and around we go.

> +
> + return head;
> +}

Or am I completely not getting it?