Re: [PATCH 31/36] security,rcu: convertcall_rcu(user_update_rcu_disposal) to kfree_rcu()

From: Paul E. McKenney
Date: Thu Mar 31 2011 - 15:04:59 EST


On Thu, Mar 31, 2011 at 07:50:40PM +0100, David Howells wrote:
> Lai Jiangshan <laijs@xxxxxxxxxxxxxx> wrote:
>
> > The rcu callback user_update_rcu_disposal() just calls a kfree(),
> > so we use kfree_rcu() instead of the call_rcu(user_update_rcu_disposal).
> >
> > Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
>
> Nice idea, but how do you handle the container_of()?

Hello, David,

This patch relies on an earlier patch from Lai shown below.

The short answer is that kfree_rcu() is a cpp macro that takes a pointer
to the struct rcu_head and the name of the rcu_head field within the
enclosing structure. This macro then does the required offset_of().

Thanx, Paul

------------------------------------------------------------------------

commit 316b740440ea0a54615a3516df536fbf9a4c11b8
Author: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
Date: Fri Mar 18 11:15:47 2011 +0800

rcu: introduce kfree_rcu()

Many rcu callbacks functions just call kfree() on the base structure.
These functions are trivial, but their size adds up, and furthermore
when they are used in a kernel module, that module must invoke the
high-latency rcu_barrier() function at module-unload time.

The kfree_rcu() function introduced by this commit addresses this issue.
Rather than encoding a function address in the embedded rcu_head
structure, kfree_rcu() instead encodes the offset of the rcu_head
structure within the base structure. Because the functions are not
allowed in the low-order 4096 bytes of kernel virtual memory, offsets
up to 4095 bytes can be accommodated. If the offset is larger than
4095 bytes, a compile-time error will be generated in __kfree_rcu().
If this error is triggered, you can either fall back to use of call_rcu()
or rearrange the structure to position the rcu_head structure into the
first 4096 bytes.

Note that the allowable offset might decrease in the future, for example,
to allow something like kmem_cache_free_rcu().

The new kfree_rcu() function can replace code as follows:

call_rcu(&p->rcu, simple_kfree_callback);

where "simple_kfree_callback()" might be defined as follows:

void simple_kfree_callback(struct rcu_head *p)
{
struct foo *q = container_of(p, struct foo, rcu);

kfree(q);
}

with the following:

kfree_rcu(&p->rcu, rcu);

Note that the "rcu" is the name of a field in the structure being
freed. The reason for using this rather than passing in a pointer
to the base structure is that the above approach allows better type
checking.

This commit is based on earlier work by Lai Jiangshan and Manfred Spraul:

Lai's V1 patch: http://lkml.org/lkml/2008/9/18/1
Manfred's patch: http://lkml.org/lkml/2009/1/2/115

Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
Signed-off-by: Manfred Spraul <manfred@xxxxxxxxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index af56148..70c932f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -777,4 +777,60 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
}
#endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */

+static __always_inline bool __is_kfree_rcu_offset(unsigned long offset)
+{
+ return offset < 4096;
+}
+
+static __always_inline
+void __kfree_rcu(struct rcu_head *head, unsigned long offset)
+{
+ typedef void (*rcu_callback)(struct rcu_head *);
+
+ BUILD_BUG_ON(!__builtin_constant_p(offset));
+
+ /* See the kfree_rcu() header comment. */
+ BUILD_BUG_ON(!__is_kfree_rcu_offset(offset));
+
+ call_rcu(head, (rcu_callback)offset);
+}
+
+extern void kfree(const void *);
+
+static inline void __rcu_reclaim(struct rcu_head *head)
+{
+ unsigned long offset = (unsigned long)head->func;
+
+ if (__is_kfree_rcu_offset(offset))
+ kfree((void *)head - offset);
+ else
+ head->func(head);
+}
+
+/**
+ * kfree_rcu() - kfree an object after a grace period.
+ * @ptr: pointer to kfree
+ * @rcu_head: the name of the struct rcu_head within the type of @ptr.
+ *
+ * Many rcu callbacks functions just call kfree() on the base structure.
+ * These functions are trivial, but their size adds up, and furthermore
+ * when they are used in a kernel module, that module must invoke the
+ * high-latency rcu_barrier() function at module-unload time.
+ *
+ * The kfree_rcu() function handles this issue. Rather than encoding a
+ * function address in the embedded rcu_head structure, kfree_rcu() instead
+ * encodes the offset of the rcu_head structure within the base structure.
+ * Because the functions are not allowed in the low-order 4096 bytes of
+ * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
+ * If the offset is larger than 4095 bytes, a compile-time error will
+ * be generated in __kfree_rcu(). If this error is triggered, you can
+ * either fall back to use of call_rcu() or rearrange the structure to
+ * position the rcu_head structure into the first 4096 bytes.
+ *
+ * Note that the allowable offset might decrease in the future, for example,
+ * to allow something like kmem_cache_free_rcu().
+ */
+#define kfree_rcu(ptr, rcu_head) \
+ __kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
+
#endif /* __LINUX_RCUPDATE_H */
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 0c343b9..4d60fbc 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -167,7 +167,7 @@ static void rcu_process_callbacks(struct rcu_ctrlblk *rcp)
prefetch(next);
debug_rcu_head_unqueue(list);
local_bh_disable();
- list->func(list);
+ __rcu_reclaim(list);
local_bh_enable();
list = next;
RCU_TRACE(cb_count++);
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 89e7dbb..4e1d925 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1161,7 +1161,7 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
next = list->next;
prefetch(next);
debug_rcu_head_unqueue(list);
- list->func(list);
+ __rcu_reclaim(list);
list = next;
if (++count >= rdp->blimit)
break;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/