Re: lock-up with module: Optimize __module_address() using a latched RB-tree

From: Arthur Marsh
Date: Tue Jul 07 2015 - 16:16:07 EST




Mathieu Desnoyers wrote on 08/07/15 02:03:
----- On Jul 7, 2015, at 3:29 AM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote:

On Tue, Jul 07, 2015 at 02:59:06PM +0930, Arthur Marsh wrote:
I had a single, non-reproducible case of the same lock-up happening on my
other machine running the Linus git head kernel in 64-bit mode.

Hmm, disturbing.. I've had my machines run this stuff for weeks and not
had anything like this :/

Do you have a serial cable between those machines? serial console output
will allow capturing more complete traces than these pictures can and
might also aid in capturing some extra debug info.

In any case, I'll go try and build some debug code.

Arthur: can you double-check if you load any module with --force ?
This could cause a module header layout mismatch, which can be an
issue with the changes done by the identified commit: the module
header layout changes there.

Also, I'm attaching a small patch which serializes both updates and
reads of the module rbree. Can you try it out ? If the problem
still shows with the spinlocks in place, that would mean the issue
is *not* a race between latched rbtree updates and traversals.

Thanks!

Mathieu


I'm not aware of any modules being loaded with --force .

I've applied the patch, thanks!

The resultant kernel locked up as follows:

http://www.users.on.net/~arthur.marsh/20150708469.jpg
http://www.users.on.net/~arthur.marsh/20150708470.jpg

Sorry that the first image isn't as clear as the second - it only appears for a few seconds.

Hopefully these will provide some clue as to what is happening.

Arthur.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/