Re: Abysmal disk performance, how to debug?

From: Willy Tarreau
Date: Sat Jan 20 2007 - 15:29:14 EST


Hi Tim,

On Sat, Jan 20, 2007 at 09:10:22PM +0100, Tim Schmielau wrote:
> On Sat, 20 Jan 2007, Ismail Dönmez wrote:
>
> > 20 Oca 2007 Cts 19:45 tarihinde ??unlar?? yazm????t??n??z:
> > [...]
> > > > vaio cartman # hdparm -tT /dev/hda
> > > >
> > > > /dev/hda:
> > > > Timing cached reads: 1576 MB in 2.00 seconds = 788.18 MB/sec
> > > > Timing buffered disk reads: 74 MB in 3.01 seconds = 24.55 MB/sec
> > > >
> > > >
> > > > [~]> time dd if=/dev/zero of=/tmp/1GB bs=1M count=1024
> > > > 1024+0 records in
> > > > 1024+0 records out
> > > > 1073741824 bytes (1,1 GB) copied, 77,2809 s, 13,9 MB/s
> > > >
> > > > real 1m17.482s
> > > > user 0m0.003s
> > > > sys 0m2.350s
> > >
> > > That's not bad at all ! I suspect that if your system becomes unresponsive,
> > > it's because real writes start when the cache is full. And if you fill
> > > 512 MB of RAM with data that you then need to flush on disk at 14 MB/s, it
> > > can take about 40 seconds during which it might be difficult to do
> > > anything.
> > >
> > > Try lowering the cache flush starting point to about 10 MB if you want
> > > (2% of 512 MB) :
> > >
> > > # echo 2 >/proc/sys/vm/dirty_ratio
> > > # echo 2 >/proc/sys/vm/dirty_background_ratio
> >
> > After that I get,
> >
> > [~]> time dd if=/dev/zero of=/tmp/1GB bs=1M count=1024
> > 1024+0 records in
> > 1024+0 records out
> > 1073741824 bytes (1,1 GB) copied, 41,7005 s, 25,7 MB/s
> >
> > real 0m41.926s
> > user 0m0.007s
> > sys 0m2.500s
> >
> >
> > not bad! thanks :)
>
> Note that these dd "benchmarks" are completely bogus, because the data
> doesn't actually get written to disk in that time. For some enlightening
> data, try
>
> time dd if=/dev/zero of=/tmp/1GB bs=1M count=1024; time sync
>
> The dd returns as soon as all data could be buffered in RAM. Only sync
> will show how long it takes to actually write out the data to disk.

While I 100% agree with you on this, I'd like to note that I don't agree
with the following statement :

> also explains why you see better results is writeout starts earlier.

The results should be better when writeout starts later since most of
the transfer will have been performed at RAM speed. That's what happens
with the user above with 2 GB RAM. But in case of the VAIO with 512 MB,
there's really something strange IMHO. I suspect it has two RAM areas,
one fast and one slow (probably one two large non-cacheable area for a
shared video or such a crap, which can be avoided when reducing the
cache size).

Best regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/