Re: [Patch] ipc/mqueue: add rbtree node caching support

From: Andrew Morton
Date: Wed May 16 2012 - 18:12:19 EST


On Wed, 16 May 2012 15:28:56 -0400
Doug Ledford <dledford@xxxxxxxxxx> wrote:

> When I wrote the first patch that added the rbtree support for message
> queue insertion, it sped up the case where the queue was very full
> drastically from the original code. It, however, slowed down the
> case where the queue was empty (not drastically though).
>
> This patch caches the last freed rbtree node struct so we can quickly
> reuse it when we get a new message. This is the common path for any
> queue that very frequently goes from 0 to 1 then back to 0 messages
> in queue.
>
> Andrew Morton didn't like that we were doing a GFP_ATOMIC allocation
> in msg_insert, so this patch attempts to speculatively allocate a
> new node struct outside of the spin lock when we know we need it, but
> will still fall back to a GFP_ATOMIC allocation if it has to.
>
> Once I added the caching, the necessary various ret = ; spin_unlock
> gyrations in mq_timedsend were getting pretty ugly, so this also slightly
> refactors that function to streamline the flow of the code and the
> function exit.
>
> Finally, while working on getting performance back I made sure that all
> of the node structs were always fully initialized when they were first
> used, rendering the use of kzalloc unnecessary and a waste of CPU cycles.
>
> The net result of all of this is:
>
> 1) We will avoid a GFP_ATOMIC allocation when possible, but fall back on
> it when necessary.
> 2) We will speculatively allocate a node struct using GFP_KERNEL if our
> cache is empty (and save the struct to our cache if it's still empty after
> we have obtained the spin lock).
> 3) The performance of the common queue empty case has significantly
> improved and is now much more in line with the older performance for
> this case.
>
> The performance changes are:
>
> Old mqueue new mqueue new mqueue + caching
> queue empty
> send/recv 305/288ns 349/318ns 310/322ns
>
> I don't think we'll ever be able to get the recv performance back, but
> that's because the old recv performance was a direct result and consequence
> of the old methods abysmal send performance. The recv path simply must
> do more so that the send path does not incur such a penalty under higher
> queue depths.
>
> As it turns out, the new caching code also sped up the various queue full
> cases relative to my last patch. That could be because of the difference
> between the syscall path in 3.3.4-rc5 and 3.3.4-rc6, or because of the
> change in code flow in the mq_timedsend routine. Regardless, I'll take
> it. It wasn't huge, and I *would* say it was within the margin for error,
> but after many repeated runs what I'm seeing is that the old numbers
> trend slightly higher (about 10 to 20ns depending on which test is the
> one running).
>

LGTM. Here be some coding-style fixes:

--- a/ipc/mqueue.c~ipc-mqueue-add-rbtree-node-caching-support-fix
+++ a/ipc/mqueue.c
@@ -141,16 +141,18 @@ static int msg_insert(struct msg_msg *ms
} else if (new_leaf) {
leaf = *new_leaf;
*new_leaf = NULL;
- } else
+ } else {
leaf = kmalloc(sizeof(*leaf), GFP_ATOMIC);
+ }
if (!leaf)
return -ENOMEM;
if (!info->node_cache) {
rb_init_node(&leaf->rb_node);
INIT_LIST_HEAD(&leaf->msg_list);
info->qsize += sizeof(*leaf);
- } else
+ } else {
info->node_cache = NULL;
+ }
leaf->priority = msg->m_type;
rb_link_node(&leaf->rb_node, parent, p);
rb_insert_color(&leaf->rb_node, &info->msg_tree);
@@ -196,8 +198,9 @@ try_again:
if (info->node_cache) {
info->qsize -= sizeof(*leaf);
kfree(leaf);
- } else
+ } else {
info->node_cache = leaf;
+ }
goto try_again;
} else {
msg = list_first_entry(&leaf->msg_list,
@@ -208,8 +211,9 @@ try_again:
if (info->node_cache) {
info->qsize -= sizeof(*leaf);
kfree(leaf);
- } else
+ } else {
info->node_cache = leaf;
+ }
}
}
info->attr.mq_curmsgs--;
@@ -1033,9 +1037,11 @@ SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqd
msg_ptr->m_ts = msg_len;
msg_ptr->m_type = msg_prio;

- /* msg_insert really wants us to have a valid, spare node struct so
- * it doesn't have to kmalloc a GFP_ATOMIC allocation, but it will
- * fall back to that if necessary. */
+ /*
+ * msg_insert really wants us to have a valid, spare node struct so it
+ * doesn't have to kmalloc a GFP_ATOMIC allocation, but it will fall
+ * back to that if necessary.
+ */
if (!info->node_cache)
new_leaf = kmalloc(sizeof(*new_leaf), GFP_KERNEL);

@@ -1058,8 +1064,10 @@ SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqd
wait.msg = (void *) msg_ptr;
wait.state = STATE_NONE;
ret = wq_sleep(info, SEND, timeout, &wait);
- /* wq_sleep must be called with info->lock held, and
- * returns with the lock released */
+ /*
+ * wq_sleep must be called with info->lock held, and
+ * returns with the lock released
+ */
goto out_free;
}
} else {
@@ -1136,9 +1144,11 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t,
goto out_fput;
}

- /* msg_insert really wants us to have a valid, spare node struct so
- * it doesn't have to kmalloc a GFP_ATOMIC allocation, but it will
- * fall back to that if necessary. */
+ /*
+ * msg_insert really wants us to have a valid, spare node struct so it
+ * doesn't have to kmalloc a GFP_ATOMIC allocation, but it will fall
+ * back to that if necessary.
+ */
if (!info->node_cache)
new_leaf = kmalloc(sizeof(*new_leaf), GFP_KERNEL);



> @@ -132,15 +134,24 @@ static int msg_insert(struct msg_msg *msg, struct mqueue_inode_info *info)
> else
> p = &(*p)->rb_right;
> }
> - leaf = kzalloc(sizeof(*leaf), GFP_ATOMIC);
> + if (info->node_cache) {
> + leaf = info->node_cache;
> + } else if (new_leaf) {
> + leaf = *new_leaf;
> + *new_leaf = NULL;
> + } else
> + leaf = kmalloc(sizeof(*leaf), GFP_ATOMIC);
> if (!leaf)
> return -ENOMEM;
> - rb_init_node(&leaf->rb_node);
> - INIT_LIST_HEAD(&leaf->msg_list);
> + if (!info->node_cache) {
> + rb_init_node(&leaf->rb_node);
> + INIT_LIST_HEAD(&leaf->msg_list);
> + info->qsize += sizeof(*leaf);
> + } else
> + info->node_cache = NULL;

So you have

: if (info->node_cache) {
: leaf = info->node_cache;
: } else if (new_leaf) {
: leaf = *new_leaf;
: *new_leaf = NULL;
: } else {
: leaf = kmalloc(sizeof(*leaf), GFP_ATOMIC);
: }
: if (!leaf)
: return -ENOMEM;
: if (!info->node_cache) {
: rb_init_node(&leaf->rb_node);
: INIT_LIST_HEAD(&leaf->msg_list);
: info->qsize += sizeof(*leaf);
: } else {
: info->node_cache = NULL;
: }

But I think it's a bit simpler to do

: if (info->node_cache) {
: leaf = info->node_cache;
: info->node_cache = NULL;
: } else {
: if (new_leaf) {
: leaf = *new_leaf;
: *new_leaf = NULL;
: } else {
: leaf = kmalloc(sizeof(*leaf), GFP_ATOMIC);
: if (!leaf)
: return -ENOMEM;
: }
: rb_init_node(&leaf->rb_node);
: INIT_LIST_HEAD(&leaf->msg_list);
: info->qsize += sizeof(*leaf);
: }

and my version generates 25 bytes less text.

But why did we need to initialise *leaf's fields if the caller passed
us something in *new_leaf? sys_mq_timedsend() already initialised the
fields for us?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/