Re: Crashes with 874bbfe600a6 in 3.18.25

From: Daniel Bilik
Date: Thu Feb 04 2016 - 11:53:11 EST


On Thu, 4 Feb 2016 12:20:44 +0100
Jan Kara <jack@xxxxxxx> wrote:

> Thanks for backport Thomas and to Mike for persistence :). I've asked my
> friend seeing crashes with 3.18.25 to try whether this patch fixes the
> issues. It may take some time so stay tuned...

Patch tested and it really fixes the crash we were experiencing on 3.18.25
with commit 874bbfe+. But it seem to introduce (rather scary) regression.
Tested host shows abnormal cpu usage in both kernel and userland under the
same load and traffic pattern. One picture is worth a thousand words, so
I've taken snapshots of our graphs, see here:
http://neosystem.cz/test/linux-3.18.25/
The host was running 3.18.25 with commit 874bbfe+ (1e7af29+ on
3.18-stable) reverted. With this commit included, it crashed within
minutes. Around 13:30 we booted 3.18.25 with commit 874bbfe+ included and
with the patch from Thomas. And around 15:40 we've booted the host with
previous kernel, just to ensure this abnormal behaviour was really caused
by the test kernel.
Also interesting, in addition to high cpu usage, there is abnormally high
number of zombie processes reported by the system.

HTH.

--
Daniel Bilik
neosystem.cz