Re: [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists

From: Tejun Heo
Date: Thu Jul 14 2016 - 07:50:52 EST


Hello,

On Wed, Jul 13, 2016 at 10:54:19PM -0400, Waiman Long wrote:
> On 07/13/2016 12:08 PM, Tejun Heo wrote:
> > On Mon, Jul 11, 2016 at 01:32:06PM -0400, Waiman Long wrote:
> > ...
> > > A new header file include/linux/dlock-list.h will be added with the
> > Heh, I think perpcu_list was the better name but suppose I'm too late.
>
> Given the fact that we may change the level of lock granularity in the
> future, it is too late to change it back to percpu_list.

It's more about the consistency. We have other data structures which
are in a similar situation. If we end up doing Nth CPU for dlist,
we'll probably end up doing the same thing for other percpu data
structures.

> > Why do we need two variants of this? We need a state variable to walk
> > the list anyway. Why not make dlock_list_iterate() safe against
> > removal and get rid of the second variant? Also, dlock_list_next()
> > probably is a better name.
>
> The two variants are the equivalent of list_for_each_entry() and
> list_for_each_entry_safe(). There are scenarios where
> list_for_each_entry_safe() isn't really safe when the next pointer of the
> current node is modified, for example. In this particular use case, the safe
> version isn't needed as all the list_for_each_entry_safe() call sites were
> modified to use list_for_each_entry() instead. I keep the
> dlock_list_iterate_safe() for future use cases that may need the safe
> version.

I don't think it makes sense to worry about the cases where the next
entry to iterate may be removed by the iterator. What I'm trying to
say is just make the iteration always safe and don't worry about the
distinction. For list_for_each_entry(), it makes the difference of
requiring and not requiring a separtae state variable. Here, we need
it anyway.

> > > +#ifdef CONFIG_DEBUG_SPINLOCK
> > > +#define DLOCK_LIST_WARN_ON(x) WARN_ON(x)
> > > +#else
> > > +#define DLOCK_LIST_WARN_ON(x)
> > > +#endif
> > I'd just use WARN_ON_ONCE() without the CONFIG guard.
>
> I'd do so if it is used in a real function. However, it is used in the
> dlock_list_iterate() macro which will duplicate the WARN_ON call on all the
> call sites. That is why I am hesitant to do that.

They're used in inline functions but not macros. Just uninline the
functions?

> > > +/*
> > > + * Next per-cpu list entry
> > > + */
> > > +#define dlock_list_next_entry(pos, member) list_next_entry(pos, member.list)
> > Why does this need to be exposed?
>
> It was used in one of the modified file. I can change it to an inline
> function. Or do you mean making it a real function and put it in the .c
> file?

If it's used, never mind.

> > > +static __always_inline bool
> > > +__dlock_list_next_cpu(struct dlock_list_head *head,
> > > + struct dlock_list_state *state)
> > > +{
> > ...
> > > +static inline bool dlock_list_iterate_safe(struct dlock_list_head *head,
> > > + struct dlock_list_state *state)
> > > +{
> > Inlining these doesn't make senes to me.
>
> They are inlined for performance reason. I can make them real functions if
> you think it is better.

The functions aren't that small and inlining functions which aren't
very small tend to become more expensive pretty quickly with multiple
users. I'd just make them proper functions.

> > > diff --git a/lib/Makefile b/lib/Makefile
> > > index 499fb35..92e8c38 100644
> > > --- a/lib/Makefile
> > > +++ b/lib/Makefile
> > > +/*
> > > + * The dlock list lock needs its own class to avoid warning and stack
> > > + * trace when lockdep is enabled.
> > > + */
> > Can you please elaborate on this?
>
> Kernel warning was given out and I think something got disabled without that
> when the kernel booted up with lockdep enabled. I think it is because the
> locks were dynamically allocated. It will not be a problem if they are
> statically allocated.

Ah, okay. Can you update the comment to include the above
information. It doesn't have to be long. Just mention that this is
lockdep key for dlock data structure which is dynamically allocated.

> > > +void dlock_list_add(struct dlock_list_node *node, struct dlock_list_head *head)
> > > +{
> > > + struct dlock_list_head *myhead;
> > > +
> > > + /*
> > > + * Disable preemption to make sure that CPU won't gets changed.
> > > + */
> > > + myhead = get_cpu_ptr(head);
> > > + spin_lock(&myhead->lock);
> > > + node->lockptr =&myhead->lock;
> > > + list_add(&node->list,&myhead->list);
> > > + spin_unlock(&myhead->lock);
> > > + put_cpu_ptr(head);
> > > +}
> > I wonder whether it'd be better to use irqsafe operations. lists tend
> > to be often used from irq contexts.
>
> The current use case only need to use the regular lock functions. You are
> right that future use cases may require an irqsafe version of locks. I can
> either modify the code now to allow lock type selection at init time, for
> example, or defer it as a future enhancement when the need arises. What do
> you think?

The bulk of performance gain of dlist would come from being per-cpu
and I don't think it's likely that we'd see any noticeable difference
between irq and preempt safe operations. Given that what's being
implemented is really low level operations, I'd suggest going with
irqsafe from the get-go.

> > > +void dlock_list_del(struct dlock_list_node *node)
> > > +{
> > > + spinlock_t *lock = READ_ONCE(node->lockptr);
> > > +
> > > + if (unlikely(!lock)) {
> > > + WARN(1, "dlock_list_del: node 0x%lx has no associated lock\n",
> > > + (unsigned long)node);
> > > + return;
> > > + }
> > > +
> > > + spin_lock(lock);
> > > + if (likely(lock == node->lockptr)) {
> > > + list_del_init(&node->list);
> > > + node->lockptr = NULL;
> > > + } else {
> > > + /*
> > > + * This path should never be executed.
> > > + */
> > What if it races against someone else removing and adding back?
> > Shouldn't it retry on those cases?
>
> If it is racing with another thread removing and adding back, retrying the
> deletion may not be the right thing to do. It really depends on what kind of
> deletion the callers are doing. I will clarify in the comment further change
> may be needed in the future if future callers are doing complicated deletion
> that requires more elaborate exception handling.

I haven't really thought about it but it could be that in the racing
cases the order doesn't matter and just skipping the operation is
fine. I'm not sure triggering warning is the right answer tho.

Thanks.

--
tejun