Re: frequent lockups in 3.18rc4

From: Ingo Molnar
Date: Fri Dec 12 2014 - 01:54:26 EST



* Sasha Levin <sasha.levin@xxxxxxxxxx> wrote:

> Right, and it reproduces in 3.10 as well, so it's not really a
> new thing.
>
> What's odd is that I don't remember seeing this bug so long in
> the past, I'll try bisecting trinity rather than the kernel -
> it's the only other thing that changed.

So I think DaveJ mentioned it that Trinity recently changed its
test task count and is now more aggressively loading the system.
Such a change might have made a dormant, resource limits related
bug or load dependent race more likely.

I think at this point it would also be useful to debug the hang
itself directly: using triggered printks and kgdb and drilling
into all the data structures to figure out why the system isn't
progressing.

If the bug triggers in a VM (which your testing uses) the failed
kernel state ought to be a lot more accessible than bare metal.

That it triggers in a VM, and if it's the same bug as DaveJ's,
that also makes the hardware bug theory a lot less likely.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/