Stuck in iowait regression in 3.18.1

From: Jonas BofjÃll
Date: Tue Dec 23 2014 - 10:38:12 EST


After an uptime of a couple of days, I see some very strange i/o problems on Linux 3.18.1. The problem does not exist on 3.17.7 with identical configuration.

I originally had these problems running under VMware, using pvscsi, but I have since started seeing the same problems under KVM.

The process most likely to trip these problems are xz. The process gets stuck in iowait (D in ps) after having read or written a few hundred kB, almost every time it is run. The same thing has happened to a few other processes too (including sort and uniq), but the overwhelming majority work just fine.

The process _does_ get cputime eventually. A kill signal is honored in the time span of about an hour. Mostly. (Also, when working on reproducing the problem I once lost visibility of other terminal's processes under /proc. Could be related?)

I would like to bisect the 3.18 changes to see which one introduced this, but reproducing this takes several days. Any input or ideas how to hunt this down further?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/