Re: lock-up with module: Optimize __module_address() using a latched RB-tree

From: Mathieu Desnoyers
Date: Tue Jul 07 2015 - 12:33:37 EST


----- On Jul 7, 2015, at 3:29 AM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote:

> On Tue, Jul 07, 2015 at 02:59:06PM +0930, Arthur Marsh wrote:
>> I had a single, non-reproducible case of the same lock-up happening on my
>> other machine running the Linus git head kernel in 64-bit mode.
>
> Hmm, disturbing.. I've had my machines run this stuff for weeks and not
> had anything like this :/
>
> Do you have a serial cable between those machines? serial console output
> will allow capturing more complete traces than these pictures can and
> might also aid in capturing some extra debug info.
>
> In any case, I'll go try and build some debug code.

Arthur: can you double-check if you load any module with --force ?
This could cause a module header layout mismatch, which can be an
issue with the changes done by the identified commit: the module
header layout changes there.

Also, I'm attaching a small patch which serializes both updates and
reads of the module rbree. Can you try it out ? If the problem
still shows with the spinlocks in place, that would mean the issue
is *not* a race between latched rbtree updates and traversals.

Thanks!

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
From 0d046f205fa49c477bbf81b72cd038fb9f7e40d6 Mon Sep 17 00:00:00 2001
From: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Date: Tue, 7 Jul 2015 12:24:37 -0400
Subject: [PATCH] TESTING: add spinlock to module.c rb latch tree

Not-Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
---
kernel/module.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/kernel/module.c b/kernel/module.c
index 3e0e197..eb99751 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -181,6 +181,9 @@ static struct mod_tree_root {
#define module_addr_min mod_tree.addr_min
#define module_addr_max mod_tree.addr_max

+/* Fully serialize readers and updates to rb latch tree. */
+static DEFINE_SPINLOCK(test_rb_latch_lock);
+
static noinline void __mod_tree_insert(struct mod_tree_node *node)
{
latch_tree_insert(&node->node, &mod_tree.root, &mod_tree_ops);
@@ -197,31 +200,50 @@ static void __mod_tree_remove(struct mod_tree_node *node)
*/
static void mod_tree_insert(struct module *mod)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(&test_rb_latch_lock, flags);
mod->mtn_core.mod = mod;
mod->mtn_init.mod = mod;

__mod_tree_insert(&mod->mtn_core);
if (mod->init_size)
__mod_tree_insert(&mod->mtn_init);
+ spin_unlock_irqrestore(&test_rb_latch_lock, flags);
}

static void mod_tree_remove_init(struct module *mod)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(&test_rb_latch_lock, flags);
if (mod->init_size)
__mod_tree_remove(&mod->mtn_init);
+ spin_unlock_irqrestore(&test_rb_latch_lock, flags);
}

static void mod_tree_remove(struct module *mod)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(&test_rb_latch_lock, flags);
__mod_tree_remove(&mod->mtn_core);
mod_tree_remove_init(mod);
+ spin_unlock_irqrestore(&test_rb_latch_lock, flags);
}

static struct module *mod_find(unsigned long addr)
{
struct latch_tree_node *ltn;
+ unsigned long flags;

+ if (in_nmi()) {
+ printk(KERN_ERR "mod_find called from NMI\n");
+ return NULL;
+ }
+ spin_lock_irqsave(&test_rb_latch_lock, flags);
ltn = latch_tree_find((void *)addr, &mod_tree.root, &mod_tree_ops);
+ spin_unlock_irqrestore(&test_rb_latch_lock, flags);
if (!ltn)
return NULL;

--
1.9.1