Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu

From: Paul E. McKenney
Date: Mon Nov 30 2015 - 15:29:37 EST


On Mon, Nov 30, 2015 at 10:39:15AM +1100, Paul Mackerras wrote:
> On Wed, Nov 18, 2015 at 11:13:28AM -0800, Paul E. McKenney wrote:
> > On Fri, Nov 06, 2015 at 01:17:17PM +1100, Alexey Kardashevskiy wrote:
> [snip]
> > > Still, is my approach correct? What does the comment for
> > > lockless_dereference() actally mean - it won't work together with
> > > RCU at all or this is to force people not to use it as
> > > "list_for_each_entry_rcu() should really be used in 99.99% of the
> > > time"? :)
> >
> > Well, it depends...
> >
> > The key difference between lockless_dereference() and rcu_dereference()
> > is that lockless_dereference() won't complain if used outside of
> > an RCU read-side critical section. When there is no RCU read-side
> > critical section, lockless_dereference() cannot rely on RCU's normal
> > action of keeping the data item around. Therefore, when you are using
> > lockless_dereference(), you have to have some mechanism other than RCU
> > to keep the data item around. The usual approach is for the data item
> > to never be freed, for example, if data is only ever added to the list
> > and never removed. Other approaches include reference counting, hazard
> > pointers, hashed arrays of locks, garbage collectors, transactional
> > memory, and so on.
>
> So, the situation is that we have an RCU-protected list, which in this
> case we are traversing without modifying the list. We are in an
> restricted environment (hypervisor real mode) where we can't be
> preempted, both because interrupts are hard-disabled and because our
> caller has done preempt_disable(). In this restricted environment we
> can access the linear mapping (thus kmalloc'd data) but not the
> vmalloc or ioremap regions.
>
> Thus we are not formally in a RCU read-side critical section, though
> we believe that having preemption disabled gives us equivalent
> protection. Probably what we should do is to add a
> rcu_read_lock/unlock pair in a function higher up the call chain
> so that we are actually in a RCU read-side critical section.
>
> Then the only reason not to use list_for_each_entry_rcu would be that
> we don't trust the checking machinery not to ever access vmalloc'd
> data. In other words, we want a list_for_each_entry_nocheck
> or list_for_each_entry_restricted which omits all the lockdep
> checking. That would need a list_entry_rcu_nocheck which would need a
> thing like rcu_dereference_raw that does no lockdep checking - which
> is where I thought you suggested lockless_dereference.
>
> So, what name do you like for these primitives, and where should they
> go?

I am good with list_for_each_entry_lockless() and with its using
lockless_dereference(). I am also good with this going into
include/linux/rculist.h. That said...

o The name of the macro was last given as list_entry_lockless()
rather than the name list_for_each_entry_lockless() that was
used in the docbook comment. The latter is preferable because
it is consistent with the other list-traversal macros.

o The docbook comment (and I quote) "This list-traversal
primitive may safely run concurrently" is clearly inadequate.
The points it needs to make are:

1. This locklessly traverses a list.

2. It may be used when items are never removed from the list.

3. It may also be used in environments where preemption
is implicitly disabled, but where lockdep cannot safely
be invoked. In this case, after an element is removed,
the synchronize_sched(), synchronize_sched_expedited(),
or call_rcu_sched() functions must be used to wait for
the needed grace period.

o The commit log is also inadequate, though given a good patch,
I can fix this. It needs some of the information that we have
been sending back and forth in this email thread.

It feels to me like we are going around in circles here. Someone needs
to submit an updated patch to move this forward: I cannot in good faith
accept the November 3rd patch. Are you or Alexey going to send this
patch, or should I produce my best guess as to what you guys need?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/