Re: 4.4: INFO: rcu_sched self-detected stall on CPU

From: Boris Ostrovsky
Date: Wed Mar 30 2016 - 09:45:04 EST

On 03/29/2016 02:04 PM, Steven Haigh wrote:
Greg, please see below - this is probably more for you...

On 03/29/2016 04:56 AM, Steven Haigh wrote:
Interestingly enough, this just happened again - but on a different
virtual machine. I'm starting to wonder if this may have something to do
with the uptime of the machine - as the system that this seems to happen
to is always different.

Destroying it and monitoring it again has so far come up blank.

I've thrown the latest lot of kernel messages here:
So I just did a bit of digging via the almighty Google.

I started hunting for these lines, as they happen just before the stall:
BUG: Bad rss-counter state mm:ffff88007b7db480 idx:2 val:-1
BUG: Bad rss-counter state mm:ffff880079c638c0 idx:0 val:-1
BUG: Bad rss-counter state mm:ffff880079c638c0 idx:2 val:-1

I stumbled across this post on the lkml:

The patch attached seems to reference the following change in
unmap_mapping_range in mm/memory.c:
- struct zap_details details;
+ struct zap_details details = { };
When I browse the GIT tree for 4.4.6:

I see at line 2411:
struct zap_details details;

Is this something that has been missed being merged into the 4.4 tree?
I'll admit my kernel knowledge is not enough to understand what the code
actually does - but the similarities here seem uncanny.

The patch that you are referring to is trying to fix a bug in a feature that's not in the mainline yet ("mm, oom: introduce oom reaper").