Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu

From: Alexey Kardashevskiy
Date: Thu Nov 05 2015 - 21:17:41 EST


On 11/04/2015 01:39 AM, Steven Rostedt wrote:
On Tue, 3 Nov 2015 17:57:05 +1100
Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:

This defines list_for_each_entry_lockless. This allows safe list
traversing in cases when lockdep() invocation is unwanted like
real mode (MMU is off).

Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx>
---

This is for VFIO acceleration in POWERKVM for pSeries guests.
There is a KVM instance. There also can be some VFIO (PCI passthru)
devices attached to a KVM guest.

To perform DMA, a pSeries guest registers DMA memory by calling some
hypercalls explicitely at the rate close to one-two hcalls per
a network packet, i.e. very often. When a guest does a hypercall
(which is just an assembly instruction), the host kernel receives it
in the real mode (MMU is off). When real mode fails to handle it,
it enables MMU and tries handling a hcall in virtual mode.

A logical bus ID (LIOBN) is a tagret id for these hypecalls.

Each VFIO device belongs to an IOMMU group. Each group has an address
translation table. It is allowed to have multiple IOMMU groups (i.e.
multiple tables) under the same LIOBN.

So effectively every DMA hcall has to update one or more TCE tables
attached to the same LIOBN. RCU is used to update/traverse this list
safely.

Using RCU as is in virtual mode is fine. Lockdep works, etc.
list_add_rcu() is used to populate the list;
list_del_rcu() + call_rcu() used to remove groups from a list.
These operations can happen in runtim as a result of PCI hotplug/unplug
in guests.

Using RCU as is in real mode is not fine as some RCU checks can lock up
the system and in real mode we won't even have a chance to see any
debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used.

Previous version of this used to define list_for_each_entry_rcu_notrace()
but it was proposed to use list_entry_lockless() instead. However
the comment for lockless_dereference() suggests this is a good idea
if "lifetime is managed by something other than RCU" but it is in my case.

So what would be the correct approach here? Thanks.

If the only use case for this so far is in POWERKVM, perhaps it should
be defined specifically (and in arch/powerpc) and not confuse others
about using this.

Or, if you do imagine that this can be used in other scenarios, then a
much deeper comment must be made in the code in the kerneldoc section.
list_for_each_entry_rcu() should really be used in 99.99% of the time
in the kernel. This looks to be an extreme exception. I hate to add a
generic helper for something that will only be used in one location.


No, I cannot imagine this really and I can move it somewhere in arch/powerpc.

Still, is my approach correct? What does the comment for lockless_dereference() actally mean - it won't work together with RCU at all or this is to force people not to use it as "list_for_each_entry_rcu() should really be used in 99.99% of the time"? :)




-- Steve


---
include/linux/rculist.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 17c6b1f..a83a924 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list,
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))

/**
+ * list_for_each_entry_lockless - iterate over rcu list of given type
+ * @pos: the type * to use as a loop cursor.
+ * @head: the head for your list.
+ * @member: the name of the list_struct within the struct.
+ *
+ * This list-traversal primitive may safely run concurrently
+ */
+#define list_entry_lockless(ptr, type, member) \
+ container_of((typeof(ptr))lockless_dereference(ptr), type, member)
+
+#define list_for_each_entry_lockless(pos, head, member) \
+ for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \
+ &pos->member != (head); \
+ pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
+
+/**
* list_for_each_entry_continue_rcu - continue iteration over list of given type
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.



--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/