NCQ/TCQ performance review (was: SATA RAID5 speed drop of 100 MB/s)

From: Al Boldi
Date: Sun Jun 24 2007 - 10:49:25 EST


Michael Tokarev wrote:
> Jeff Garzik wrote:
> > IN THEORY, RAID performance should /increase/ due to additional queued
> > commands available to be sent to the drive. NCQ == command queueing ==
> > sending multiple commands to the drive, rather than one-at-a-time like
> > normal.
> >
> > But hdparm isn't the best test for that theory, since it does not
> > simulate the transactions like real-world MD device usage does.
> >
> > We have seen buggy NCQ firmwares where performance decreases, so it is
> > possible that NCQ just isn't good on your drives.
>
> By the way, I did some testing of various drives, and NCQ/TCQ indeed
> shows some difference -- with multiple I/O processes (like "server"
> workload), IF NCQ/TCQ is implemented properly, especially in the
> drive.
>
> For example, this is a good one:
>
> Single Seagate 74Gb SCSI drive (10KRPM)
>
> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W
> 4k 1 66.4 0.5 0.6 0.5 0.6/ 0.6 0.4/ 0.2
> 2 0.6 0.6 0.5/ 0.1
> 4 0.7 0.6 0.6/ 0.2
> 16k 1 84.8 2.0 2.5 1.9 2.5/ 2.5 1.6/ 0.6
> 2 2.3 2.1 2.0/ 0.6
> 4 2.7 2.5 2.3/ 0.6
> 64k 1 84.8 7.4 9.3 7.2 9.4/ 9.3 5.8/ 2.2
> 2 8.6 7.9 7.3/ 2.1
> 4 9.9 9.1 8.1/ 2.2
> 128k 1 84.8 13.6 16.7 12.9 16.9/16.6 10.6/ 3.9
> 2 15.6 14.4 13.5/ 3.2
> 4 17.9 16.4 15.7/ 2.7
> 512k 1 84.9 34.0 41.9 33.3 29.0/27.1 22.4/13.2
> 2 36.9 34.5 30.7/ 8.1
> 4 40.5 38.1 33.2/ 8.3
> 1024k 1 83.1 36.0 55.8 34.6 28.2/27.6 20.3/19.4
> 2 45.2 44.1 36.4/ 9.9
> 4 48.1 47.6 40.7/ 7.1
>
> The tests are direct-I/O over whole drive (/dev/sdX), with
> either 1, 2, or 4 threads doing sequential or random reads
> or writes in blocks of a given size. For the R/W tests,
> we've 2, 4 or 8 threads running in total (1, 2 or 4 readers
> and the same amount of writers). Numbers are MB/sec, as
> totals (summary) for all threads.
>
> Especially interesting is the very last column - random R/W
> in parallel. In almost all cases, more threads gives larger
> total speed (I *guess* it's due to internal optimisations in
> the drive -- with more threads the drive has more chances to
> reorder commands to minimize seek time etc).
>
> The only thing I don't understand is why with larger I/O block
> size we see write speed drop with multiple threads.

It seems that the drive favors reads over writes.

> And in contrast to the above, here's another test run, now
> with Seagate SATA ST3250620AS ("desktop" class) 250GB
> 7200RPM drive:
>
> BlkSz Trd linRd rndRd linWr rndWr linR/W rndR/W
> 4k 1 47.5 0.3 0.5 0.3 0.3/ 0.3 0.1/ 0.1
> 2 0.3 0.3 0.2/ 0.1
> 4 0.3 0.3 0.2/ 0.2
> 16k 1 78.4 1.1 1.8 1.1 0.9/ 0.9 0.6/ 0.6
> 2 1.2 1.1 0.6/ 0.6
> 4 1.3 1.2 0.6/ 0.6
> 64k 1 78.4 4.3 6.7 4.0 3.5/ 3.5 2.1/ 2.2
> 2 4.5 4.1 2.2/ 2.3
> 4 4.7 4.2 2.3/ 2.4
> 128k 1 78.4 8.0 12.6 7.2 6.2/ 6.2 3.9/ 3.8
> 2 8.2 7.3 4.1/ 4.0
> 4 8.7 7.7 4.3/ 4.3
> 512k 1 78.5 23.1 34.0 20.3 17.1/17.1 11.3/10.7
> 2 23.5 20.6 11.3/11.4
> 4 24.7 21.3 11.6/11.8
> 1024k 1 78.4 34.1 33.5 24.6 19.6/19.5 16.0/12.7
> 2 33.3 24.6 15.4/13.8
> 4 34.3 25.0 14.7/15.0
>
> Here, the (total) I/O speed does not depend on the number
> of threads. From which I conclude that the drive does
> not reorder/optimize commands internally, even if NCQ is
> enabled (queue depth is 32).
>
> (And two notes. First of all, for some, those tables may
> look.. strange, showing too low speed. Note the block
> size, and note I'm doing *direct* *random* I/O, without
> buffering in the kernel. Yes, even the most advanced
> modern drives are very slow in this workload, due to
> seek times and rotation latency -- the disk is maxing
> out at the theoretical requests/secound -- take average
> seek time plus rotation latency (usually given in the
> drive specs), and divide one secound to the calculated
> value -- you'll see about 200..250 - that's requests/sec.
> And the numbers - like 0.3Mb/sec write - are very close
> to those 200..250. In any way, this is not a typical
> workload - file server for example is not like this.
> But it's more or less resembles database workload.
>
> And second, so far I haven't seen a case where a drive
> with NCQ/TCQ enabled works worse than without. I don't
> want to say there aren't such drives/controllers, but
> it just happen that I haven't seen any.)

Maybe you can post the benchmark source so people can contribute their
results.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/