Re: [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention

From: Roman Penyaev
Date: Thu Dec 13 2018 - 05:14:04 EST

Next message: Herbert Xu: "Re: [RFC PATCH] X.509: Parse public key parameters from x509 for akcipher"
Previous message: Stephen Boyd: "Re: [PATCH v12 2/2] cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver"
In reply to: Roman Penyaev: "[PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention"
Next in thread: Andrea Parri: "Re: [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2018-12-12 18:13, Andrea Parri wrote:

On Wed, Dec 12, 2018 at 12:03:57PM +0100, Roman Penyaev wrote:

[...]

+static inline void list_add_tail_lockless(struct list_head *new,
+ struct list_head *head)
+{
+ struct list_head *prev;
+
+ new->next = head;
+
+ /*
+ * Initially ->next of a new element must be updated with the head
+ * (we are inserting to the tail) and only then pointers are atomically
+ * exchanged. XCHG guarantees memory ordering, thus ->next should be
+ * updated before pointers are actually swapped.
+ */
+
+ prev = xchg(&head->prev, new);
+
+ /*
+ * It is safe to modify prev->next and new->prev, because a new element
+ * is added only to the tail and new->next is updated before XCHG.
+ */

IIUC, you're also relying on "some" ordering between the atomic load
of &head->prev above and the store to prev->next below: consider the
following snippet for two concurrent list_add_tail_lockless()'s:

{Initially: List := H -> A -> B}

CPU0 CPU1

list_add_tail_lockless(C, H): list_add_tail_lockless(D, H):

C->next = H D->next = H
prev = xchg(&H->prev, C) // =B prev = xchg(&H->prev, D) // =C
B->next = C C->next = D
C->prev = B D->prev = C

Here, as annotated, CPU0's xchg() "wins" over CPU1's xchg() (i.e., the
latter reads the value of &H->prev that the former stored to that same
location).

As you noted above, the xchg() guarantees that CPU0's store to C->next
is "ordered before" CPU0's store to &H->prev.

But we also want CPU1's load from &H->prev to be ordered before CPU1's
store to C->next, which is also guaranteed by the xchg() (or, FWIW, by
the address dependency between these two memory accesses).

I do not see what could guarantee "C->next == D" in the end, otherwise.

What am I missing?

Hi Andrea,

xchg always acts as a full memory barrier, i.e. mfence in x86 terms. So the
following statement should be always true, otherwise nothing should work as
the same code pattern is used in many generic places:

CPU0 CPU1

C->next = H
xchg(&ptr, C)
C = xchg(&ptr, D)
C->next = D

This is the only guarantee we need, i.e. make it simplier:

C->next = H
mfence mfence
C->next = D

the gurantee that two stores won't reorder. Pattern is always the same: we
prepare chunk of memory on CPU0 and do pointers xchg, CPU1 sees chunks of
memory with all stores committed by CPU0 (regardless of CPU1 does loads
or stores to this chunk).

I am repeating the same thing which you also noted, but I just want to be
sure that I do not say nonsense. So basically repeating to myself.

Ok, let's commit that. Returning to your question: "I do not see what
could guarantee "C->next == D" in the end"

At the end of what? Lockless insert procedure (insert to tail) relies only
on "head->prev". This is the single "place" where we atomically exchange
list elements and "somehow" chain them. So insert needs only actual
"head->prev", and xchg provides this guarantees to us.

But there is also a user of the list, who needs to iterate over the list
or to delete elements, etc, i.e. this user of the list needs list fully
committed to the memory. This user takes write_lock(). So answering your
question (if I understood it correctly): at the end write_lock() guarantees
that list won't be seen as corrupted and updates to the last element, i.e.
"->next" or "->prev" pointers of the last element are committed and seen
correctly.

--
Roman

Next message: Herbert Xu: "Re: [RFC PATCH] X.509: Parse public key parameters from x509 for akcipher"
Previous message: Stephen Boyd: "Re: [PATCH v12 2/2] cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver"
In reply to: Roman Penyaev: "[PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention"
Next in thread: Andrea Parri: "Re: [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]