Re: About lock-less data structure patches

From: Huang Ying
Date: Wed Mar 30 2011 - 21:39:30 EST


Hi, Mathieu,

Thank you very much for your review. Do you have time to take a look at
the lock-less memory allocator as follow?

https://lkml.org/lkml/2011/2/21/15

On 03/30/2011 11:11 PM, Mathieu Desnoyers wrote:
> * Mathieu Desnoyers (mathieu.desnoyers@xxxxxxxxxxxx) wrote:
>> * Andrew Morton (akpm@xxxxxxxxxxxxxxxxxxxx) wrote:
>>> On Wed, 30 Mar 2011 05:22:03 +0200 Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
>>>
>>>> On Wed, Mar 30, 2011 at 09:30:43AM +0800, Huang Ying wrote:
>>>>> On 03/30/2011 09:21 AM, Andrew Morton wrote:
>>>>>> On Wed, 30 Mar 2011 09:14:45 +0800 Huang Ying <ying.huang@xxxxxxxxx> wrote:
>>>>>>
>>>>>>> Hi, Andrew and Len,
>>>>>>>
>>>>>>> In my original APEI patches for 2.6.39, the following 3 patches is about
>>>>>>> lock-less data structure.
>>>>>>>
>>>>>>> [PATCH 1/7] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>>>>> [PATCH 2/7] lib, Add lock-less NULL terminated single list
>>>>>>> [PATCH 6/7] lib, Make gen_pool memory allocator lockless
>>>>>>>
>>>>>>> Len think we need some non-Intel "Acked-by" or "Reviewed-by" for these
>>>>>>> patches to go through the ACPI git tree. Or they should go through
>>>>>>> other tree, such as -mm tree.
>>>>>>
>>>>>> I just dropped a couple of your patches because include/linux/llist.h
>>>>>> vanished from linux-next. Did Len trip over the power cord?
>>>>>
>>>>> Len has dropped lock-less data structure patches from acpi git tree. He
>>>>> describe the reason in following mails.
>>>>>
>>>>> https://lkml.org/lkml/2011/3/2/501
>>>>> https://lkml.org/lkml/2011/3/23/6
>>>>
>>>> Ok so we still need a lockless reviewer really and I don't count.
>>>
>>> Well I think you count ;) If this is some Intel thing then cluebeat,
>>> cluebeat, cluebeat, overruled.
>>>
>>>> Copying Mathieu who did a lot of lockless stuff. Are you interested
>>>> in reviewing Ying's patches?
>>>
>>> That would be great.
>>
>> Sure, I can have a look. Huang, can you resend those three patches
>> adding me to CC list ? That will help me keep appropriate threading in
>> my review. Adding Paul McKenney would also be appropriate.
>
> I know, I know, I said I would wait for a repost, but now the answer
> burns my fingers. ;-) I'm replying to the patch found in
> https://lkml.org/lkml/2011/2/21/13
>
>
>> --- /dev/null
>> +++ b/include/linux/llist.h
>> @@ -0,0 +1,98 @@
>> +#ifndef LLIST_H
>> +#define LLIST_H
>> +/*
>> + * Lock-less NULL terminated single linked list
>
> Because this single-linked-list works like a stack (with "push"
> operation for llist_add, "pop" operation for llist_del_first), I would
> recommend to rename it accordingly (as a stack rather than "list"). If
> we think about other possible users of this kind of lock-free list, such
> as call_rcu(), a "queue" would be rather more appropriate than a "stack"
> (with enqueue/dequeue operations). So at the very least I would like to
> make sure this API keeps room for lock-free queue implementations that
> won't be confused with this stack API. It would also be important to
> figure out if what we really want is a stack or a queue. Some naming
> ideas follow (maybe they are a bit verbose, comments are welcome).
>
> We should note that this list implements "lock-free" push and pop
> operations (cmpxchg loops), and a "wait-free" "llist_del_all" operation
> (using xchg) (only really true for architectures with "true" xchg
> operation though, not those using LL/SC). We should think about the real
> use-case requirements put on this lockless stack to decide which variant
> is most appropriate. We can either have, with the implementation you
> propose:
>
> - Lock-free push
> - Pop protected by mutex
> - Wait-free pop all
>
> Or, as an example of an alternative structure (as Paul and I implemented
> in the userspace RCU library):
>
> - Wait-free push (stronger real-time guarantees provided by xchg())
> - Blocking pop/pop all (use cmpxchg and busy-wait for very short time
> periods)
>
> (there are others, with e.g. lock-free push, lock-free pop, lock-free
> pop all, but this one requires RCU read lock across the pop/pop/pop all
> operations and that memory reclaim of the nodes is only performed after
> a RCU grace-period has elapsed. This deals with ABA issues of concurrent
> push/pop you noticed without requiring mutexes protecting pop operations.)
>
> So it all boils down to which are the constraints of the push/pop
> callers. Typically, I would expect that the "push" operation has the
> most strict real-time constraints (and is possibly executed the most
> often, thus would also benefit from xchg() which is typically slightly
> faster than cmpxchg()), which would argue in favor of a wait-free
> push/blocking pop. But maybe I'm lacking understanding of what you are
> trying to do with this stack. Do you need to ever pop from a NMI
> handler ?

In my user case, I don't need to pop in a NMI handler, just push. But
we need to pop in a IRQ handler, so we can not use blocking pop. Please
take a look at the user case patches listed later.

> Some ideas for API identifiers:
>
> struct llist_head -> slist_stack_head
> struct llist_node -> slist_stack_node

Why call it a stack and a list? Because it is a stack implemented with
single list? I think it is better to name after usage instead of
implementation.

The next question is whether it should be named as stack or list. I
think from current user's point of view, they think they are using a
list instead of stack. There are 3 users so far as follow.

https://lkml.org/lkml/2011/1/17/14
https://lkml.org/lkml/2011/1/17/15
https://lkml.org/lkml/2011/2/21/16

And if we named this data structure as list, we can still use "queue"
for another data structure. Do you think so?

> * For your lock-free push/pop + wait-free pop_all implementation:
>
> llist_add -> slist_stack_push_lf (lock-free)
> llist_del_first -> _slist_stack_pop (needs mutex protection)
> llist_del_all -> slist_stack_pop_all_wf (wait-free)

Do we really need to distinguish between lock-free and wait-free from
interface? Will we implement both slist_stack_push_lf and
slist_stack_push_wf for one data structure?

mutex is needed between multiple "_slist_stack_pop", but not needed
between slist_stack_push_lf and _slist_stack_pop. I think it is hard to
explain that clearly via function naming.

> * If we choose to go with an alternate wait-free push implementation:
>
> llist_add -> slist_stack_push_wf (wait-free)
> llist_del_first -> slist_stack_pop_blocking (blocking)
> llist_del_all -> slist_stack_pop_all_blocking (blocking)

We need non-blocking pop, so maybe you need implement another data
structure which has these interface. I think there can be multiple
lock-less data structure in kernel.

>> + *
>> + * If there are multiple producers and multiple consumers, llist_add
>> + * can be used in producers and llist_del_all can be used in
>> + * consumers. They can work simultaneously without lock. But
>> + * llist_del_first can not be used here. Because llist_del_first
>> + * depends on list->first->next does not changed if list->first is not
>> + * changed during its operation, but llist_del_first, llist_add,
>> + * llist_add sequence in another consumer may violate that.
>
> You did not seem to define the locking rules when using both
>
> llist_del_all
> and
> llist_del_first
>
> in parallel. I expect that a mutex is needed, because a
>
> llist_del_all, llist_add, llist_add
>
> in parallel with
>
> llist_del_first
>
> could run into the same ABA problem as described above.

OK. I will add that.

>> + *
>> + * If there are multiple producers and one consumer, llist_add can be
>> + * used in producers and llist_del_all or llist_del_first can be used
>> + * in the consumer.
>> + *
>> + * The list entries deleted via llist_del_all can be traversed with
>> + * traversing function such as llist_for_each etc. But the list
>> + * entries can not be traversed safely before deleted from the list.
>
> Given that this is in fact a stack, specifying the traversal order of
> llist_for_each and friends would be appropriate.

Ok. I will add something like "traversing from head to tail" in the
comments.

>> + *
>> + * The basic atomic operation of this list is cmpxchg on long. On
>> + * architectures that don't have NMI-safe cmpxchg implementation, the
>> + * list can NOT be used in NMI handler. So code uses the list in NMI
>> + * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
>> + */
>> +
>> +struct llist_head {
>> + struct llist_node *first;
>> +};
>> +
>> +struct llist_node {
>> + struct llist_node *next;
>> +};
>> +
>> +#define LLIST_HEAD_INIT(name) { NULL }
>> +#define LLIST_HEAD(name) struct llist_head name = LLIST_HEAD_INIT(name)
>> +
>> +/**
>> + * init_llist_head - initialize lock-less list head
>> + * @head: the head for your lock-less list
>> + */
>> +static inline void init_llist_head(struct llist_head *list)
>> +{
>> + list->first = NULL;
>> +}
>> +
>> +/**
>> + * llist_entry - get the struct of this entry
>> + * @ptr: the &struct llist_node pointer.
>> + * @type: the type of the struct this is embedded in.
>> + * @member: the name of the llist_node within the struct.
>> + */
>> +#define llist_entry(ptr, type, member) \
>> + container_of(ptr, type, member)
>> +
>> +/**
>> + * llist_for_each - iterate over some deleted entries of a lock-less list
>> + * @pos: the &struct llist_node to use as a loop cursor
>> + * @node: the first entry of deleted list entries
>> + *
>> + * In general, some entries of the lock-less list can be traversed
>> + * safely only after being deleted from list, so start with an entry
>> + * instead of list head.
>> + */
>> +#define llist_for_each(pos, node) \
>> + for (pos = (node); pos; pos = pos->next)
>> +
>> +/**
>> + * llist_for_each_entry - iterate over some deleted entries of lock-less list of given type
>> + * @pos: the type * to use as a loop cursor.
>> + * @node: the fist entry of deleted list entries.
>> + * @member: the name of the llist_node with the struct.
>> + *
>> + * In general, some entries of the lock-less list can be traversed
>> + * safely only after being removed from list, so start with an entry
>> + * instead of list head.
>> + */
>> +#define llist_for_each_entry(pos, node, member) \
>> + for (pos = llist_entry((node), typeof(*pos), member); \
>> + &pos->member != NULL; \
>> + pos = llist_entry(pos->member.next, typeof(*pos), member))
>> +
>> +/**
>> + * llist_empty - tests whether a lock-less list is empty
>
> How is this llist_empty test expected to be used in combination with
> other API members ? e.g. llist_del_first, llist_del_all, llist_add ? I
> suspect that without mutex to ensure that there are no concurrent
> changes, llist_empty return value can easily be non-current.

We don't need llist_empty to be accurate. Just a quick way to test
whether list/stack is empty without deleting something from list/stack.

Best Regards,
Huang Ying

> Thanks,
>
> Mathieu
>
>> + * @head: the list to test
>> + */
>> +static inline int llist_empty(const struct llist_head *head)
>> +{
>> + return head->first == NULL;
>> +}
>> +
>> +void llist_add(struct llist_node *new, struct llist_head *head);
>> +void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
>> + struct llist_head *head);
>> +struct llist_node *llist_del_first(struct llist_head *head);
>> +struct llist_node *llist_del_all(struct llist_head *head);
>> +#endif /* LLIST_H */
>> --- a/lib/Kconfig
>> +++ b/lib/Kconfig
>> @@ -219,4 +219,7 @@ config LRU_CACHE
>> config AVERAGE
>> bool
>>
>> +config LLIST
>> + bool
>> +
>> endmenu
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -110,6 +110,8 @@ obj-$(CONFIG_ATOMIC64_SELFTEST) += atomi
>>
>> obj-$(CONFIG_AVERAGE) += average.o
>>
>> +obj-$(CONFIG_LLIST) += llist.o
>> +
>> hostprogs-y := gen_crc32table
>> clean-files := crc32table.h
>>
>> --- /dev/null
>> +++ b/lib/llist.c
>> @@ -0,0 +1,119 @@
>> +/*
>> + * Lock-less NULL terminated single linked list
>> + *
>> + * The basic atomic operation of this list is cmpxchg on long. On
>> + * architectures that don't have NMI-safe cmpxchg implementation, the
>> + * list can NOT be used in NMI handler. So code uses the list in NMI
>> + * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
>> + *
>> + * Copyright 2010 Intel Corp.
>> + * Author: Huang Ying <ying.huang@xxxxxxxxx>
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License version
>> + * 2 as published by the Free Software Foundation;
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>> + */
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/llist.h>
>> +
>> +#include <asm/system.h>
>> +
>> +/**
>> + * llist_add - add a new entry
>> + * @new: new entry to be added
>> + * @head: the head for your lock-less list
>> + */
>> +void llist_add(struct llist_node *new, struct llist_head *head)
>> +{
>> + struct llist_node *entry;
>> +
>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>> + BUG_ON(in_nmi());
>> +#endif
>> +
>> + do {
>> + entry = head->first;
>> + new->next = entry;
>> + } while (cmpxchg(&head->first, entry, new) != entry);
>> +}
>> +EXPORT_SYMBOL_GPL(llist_add);
>> +
>> +/**
>> + * llist_add_batch - add several linked entries in batch
>> + * @new_first: first entry in batch to be added
>> + * @new_last: last entry in batch to be added
>> + * @head: the head for your lock-less list
>> + */
>> +void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
>> + struct llist_head *head)
>> +{
>> + struct llist_node *entry;
>> +
>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>> + BUG_ON(in_nmi());
>> +#endif
>> +
>> + do {
>> + entry = head->first;
>> + new_last->next = entry;
>> + } while (cmpxchg(&head->first, entry, new_first) != entry);
>> +}
>> +EXPORT_SYMBOL_GPL(llist_add_batch);
>> +
>> +/**
>> + * llist_del_first - delete the first entry of lock-less list
>> + * @head: the head for your lock-less list
>> + *
>> + * If list is empty, return NULL, otherwise, return the first entry deleted.
>> + *
>> + * Only one llist_del_first user can be used simultaneously with
>> + * multiple llist_add users without lock. Because otherwise
>> + * llist_del_first, llist_add, llist_add sequence in another user may
>> + * change @head->first->next, but keep @head->first. If multiple
>> + * consumers are needed, please use llist_del_all.
>> + */
>> +struct llist_node *llist_del_first(struct llist_head *head)
>> +{
>> + struct llist_node *entry;
>> +
>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>> + BUG_ON(in_nmi());
>> +#endif
>> +
>> + do {
>> + entry = head->first;
>> + if (entry == NULL)
>> + return NULL;
>> + } while (cmpxchg(&head->first, entry, entry->next) != entry);
>> +
>> + return entry;
>> +}
>> +EXPORT_SYMBOL_GPL(llist_del_first);
>> +
>> +/**
>> + * llist_del_all - delete all entries from lock-less list
>> + * @head: the head of lock-less list to delete all entries
>> + *
>> + * If list is empty, return NULL, otherwise, delete all entries and
>> + * return the pointer to the first entry.
>> + */
>> +struct llist_node *llist_del_all(struct llist_head *head)
>> +{
>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>> + BUG_ON(in_nmi());
>> +#endif
>> +
>> + return xchg(&head->first, NULL);
>> +}
>> +EXPORT_SYMBOL_GPL(llist_del_all);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/