Re: [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists

From: Waiman Long
Date: Thu Jul 14 2016 - 13:13:39 EST

Next message: Sebastian Andrzej Siewior: "Re: [RT PATCH 1/2] timers: wakeup all timer waiters"
Previous message: Oleg Nesterov: "Re: Severe performance regression w/ 4.4+ on Android due to cgroup locking changes"
In reply to: Waiman Long: "Re: [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists"
Next in thread: Tejun Heo: "Re: [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 07/14/2016 07:50 AM, Tejun Heo wrote:

Hello,

On Wed, Jul 13, 2016 at 10:54:19PM -0400, Waiman Long wrote:

On 07/13/2016 12:08 PM, Tejun Heo wrote:

On Mon, Jul 11, 2016 at 01:32:06PM -0400, Waiman Long wrote:
...

A new header file include/linux/dlock-list.h will be added with the

Heh, I think perpcu_list was the better name but suppose I'm too late.

Given the fact that we may change the level of lock granularity in the
future, it is too late to change it back to percpu_list.

It's more about the consistency. We have other data structures which
are in a similar situation. If we end up doing Nth CPU for dlist,
we'll probably end up doing the same thing for other percpu data
structures.

I see.

I got comment related to the percpu-list name from Christoph Lameter a while ago. His argument was that since deletion can happenned from any CPU, it was not really percpu like the other percpu structures. That prompted me to change the name to the current form. I am fine with either names, but I would like to keep the current name unless there is a great rationale for switching back.

Why do we need two variants of this? We need a state variable to walk
the list anyway. Why not make dlock_list_iterate() safe against
removal and get rid of the second variant? Also, dlock_list_next()
probably is a better name.

The two variants are the equivalent of list_for_each_entry() and
list_for_each_entry_safe(). There are scenarios where
list_for_each_entry_safe() isn't really safe when the next pointer of the
current node is modified, for example. In this particular use case, the safe
version isn't needed as all the list_for_each_entry_safe() call sites were
modified to use list_for_each_entry() instead. I keep the
dlock_list_iterate_safe() for future use cases that may need the safe
version.

I don't think it makes sense to worry about the cases where the next
entry to iterate may be removed by the iterator. What I'm trying to
say is just make the iteration always safe and don't worry about the
distinction. For list_for_each_entry(), it makes the difference of
requiring and not requiring a separtae state variable. Here, we need
it anyway.

A lot of those functions that need to iterate the list will release the lock in the middle, do some stuff, reacquire the lock and move on to the next entry. So it is entirely possible that new entries will be inserted between the current entry and the next one in between the release and re-acquisition of the lock. Using the safe version will skip those newly added entries which is a change in behavior for the current code. That is my main concern for making it deletion safe by default.

+#ifdef CONFIG_DEBUG_SPINLOCK
+#define DLOCK_LIST_WARN_ON(x) WARN_ON(x)
+#else
+#define DLOCK_LIST_WARN_ON(x)
+#endif

I'd just use WARN_ON_ONCE() without the CONFIG guard.

I'd do so if it is used in a real function. However, it is used in the
dlock_list_iterate() macro which will duplicate the WARN_ON call on all the
call sites. That is why I am hesitant to do that.

They're used in inline functions but not macros. Just uninline the
functions?

Yes, by uninlining those functions, I can eliminate that macro.

+/*
+ * Next per-cpu list entry
+ */
+#define dlock_list_next_entry(pos, member) list_next_entry(pos, member.list)

Why does this need to be exposed?

It was used in one of the modified file. I can change it to an inline
function. Or do you mean making it a real function and put it in the .c
file?

If it's used, never mind.

+static __always_inline bool
+__dlock_list_next_cpu(struct dlock_list_head *head,
+ struct dlock_list_state *state)
+{

...

+static inline bool dlock_list_iterate_safe(struct dlock_list_head *head,
+ struct dlock_list_state *state)
+{

Inlining these doesn't make senes to me.

They are inlined for performance reason. I can make them real functions if
you think it is better.

The functions aren't that small and inlining functions which aren't
very small tend to become more expensive pretty quickly with multiple
users. I'd just make them proper functions.

Yes, will do so.

diff --git a/lib/Makefile b/lib/Makefile
index 499fb35..92e8c38 100644
--- a/lib/Makefile
+++ b/lib/Makefile
+/*
+ * The dlock list lock needs its own class to avoid warning and stack
+ * trace when lockdep is enabled.
+ */

Can you please elaborate on this?

Kernel warning was given out and I think something got disabled without that
when the kernel booted up with lockdep enabled. I think it is because the
locks were dynamically allocated. It will not be a problem if they are
statically allocated.

Ah, okay. Can you update the comment to include the above
information. It doesn't have to be long. Just mention that this is
lockdep key for dlock data structure which is dynamically allocated.

I will update the comment.

+void dlock_list_add(struct dlock_list_node *node, struct dlock_list_head *head)
+{
+ struct dlock_list_head *myhead;
+
+ /*
+ * Disable preemption to make sure that CPU won't gets changed.
+ */
+ myhead = get_cpu_ptr(head);
+ spin_lock(&myhead->lock);
+ node->lockptr =&myhead->lock;
+ list_add(&node->list,&myhead->list);
+ spin_unlock(&myhead->lock);
+ put_cpu_ptr(head);
+}

I wonder whether it'd be better to use irqsafe operations. lists tend
to be often used from irq contexts.

The current use case only need to use the regular lock functions. You are
right that future use cases may require an irqsafe version of locks. I can
either modify the code now to allow lock type selection at init time, for
example, or defer it as a future enhancement when the need arises. What do
you think?

The bulk of performance gain of dlist would come from being per-cpu
and I don't think it's likely that we'd see any noticeable difference
between irq and preempt safe operations. Given that what's being
implemented is really low level operations, I'd suggest going with
irqsafe from the get-go.

+void dlock_list_del(struct dlock_list_node *node)
+{
+ spinlock_t *lock = READ_ONCE(node->lockptr);
+
+ if (unlikely(!lock)) {
+ WARN(1, "dlock_list_del: node 0x%lx has no associated lock\n",
+ (unsigned long)node);
+ return;
+ }
+
+ spin_lock(lock);
+ if (likely(lock == node->lockptr)) {
+ list_del_init(&node->list);
+ node->lockptr = NULL;
+ } else {
+ /*
+ * This path should never be executed.
+ */

What if it races against someone else removing and adding back?
Shouldn't it retry on those cases?

If it is racing with another thread removing and adding back, retrying the
deletion may not be the right thing to do. It really depends on what kind of
deletion the callers are doing. I will clarify in the comment further change
may be needed in the future if future callers are doing complicated deletion
that requires more elaborate exception handling.

I haven't really thought about it but it could be that in the racing
cases the order doesn't matter and just skipping the operation is
fine. I'm not sure triggering warning is the right answer tho.

Thanks.

I don't think it is normal to have concurrent deletion of the same entry. Most likely it is a bug if this happens. Having the warning message in the kernel log will help to catch those errors.

Cheers,
Longman

Next message: Sebastian Andrzej Siewior: "Re: [RT PATCH 1/2] timers: wakeup all timer waiters"
Previous message: Oleg Nesterov: "Re: Severe performance regression w/ 4.4+ on Android due to cgroup locking changes"
In reply to: Waiman Long: "Re: [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists"
Next in thread: Tejun Heo: "Re: [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]