Re: howto combat highly pathologic latencies on a server?
From: Christoph Hellwig
Date: Wed Mar 10 2010 - 13:15:56 EST
On Wed, Mar 10, 2010 at 06:17:42PM +0100, Hans-Peter Jansen wrote:
> While this system usually operates fine, it suffers from delays, that are
> displayed in latencytop as: "Writing page to disk: 8425,5 ms":
> ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec
> range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png,
> ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png.
>
> >From other observations, this issue "feels" like it is induced by single
> syncronisation points in the block layer, eg. if I create heavy IO load on
> one RAID array, say resizing a VMware disk image, it can take up to a
> minute to log in by ssh, although the ssh login does not touch this area at
> all (different RAID arrays). Note, that the latencytop snapshots above are
> made during normal operation, not this kind of load..
I had very similar issues on various systems (mostly using xfs, but some
with ext3, too) using kernels before ~ 2.6.30 when using the cfq I/O
scheduler. Switching to noop fixed that for me, or upgrading to a
recent kernel where cfq behaves better again.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/