Re: high system cpu load during intense disk i/o

From: Dimitrios Apostolou
Date: Mon Aug 06 2007 - 14:18:24 EST


RafaÅ Bilski wrote:
Hello and thanks for your reply.
Hello again,
The cron job that is running every 10 min on my system is mpop (a fetchmail-like program) and another running every 5 min is mrtg. Both normally finish within 1-2 seconds.
The fact that these simple cron jobs don't finish ever is certainly because of the high system CPU load. If you see the two_discs_bad.txt which I attached on my original message, you'll see that *vmlinux*, and specifically the *scheduler*, take up most time.
And the fact that this happens only when running two i/o processes but when running only one everything is absolutely snappy (not at all slow, see one_disc.txt), makes me sure that this is a kernel bug. I'd be happy to help but I need some guidance to pinpoint the problem.
In Your oprofile output I find "acpi_pm_read" particulary interesting. Unlike other VIA chipsets, which I know, Your doesn't use VLink to connect northbridge to southbridge. Instead PCI bus connects these two. As You probably know maximal PCI throughtput is 133MiB/s. In theory. In practice probably less.
ACPI registers are located on southbridge. This probably means that processor needs access to PCI bus in order to read ACPI timer register.
Now some math. 20GiB disk probably can send data at 20MiB/s rate. 200GiB disk probably about 40MiB/s. So 20+2*40=100MiB/s. I think that this could explain why simple inl() call takes so much time and why Your system isn't very responsive.
Thanks, Dimitris
Let me know if You find my theory amazing or amusing.

Hello Rafal,

I find your theory very nice, but unfortunately I don't think it applies here. As you can see from the vmstat outputs the write throughput is about 15MB/s for both disks. When reading I get about 30MB/s again from both disks. The other disk, the small one, is mostly idle, except for writing little bits and bytes now and then. Since the problem occurs when writing, 15MB/s is just too little I think for the PCI bus.

However I find it quite possible to have reached the throughput limit because of software (driver) problems. I have done various testing (mostly "hdparm -tT" with exactly the same PC and disks since about kernel 2.6.8 (maybe even earlier). I remember with certainty that read throughput the early days was about 50MB/s for each of the big disks, and combined with RAID 0 I got ~75MB/s. Those figures have been dropping gradually with each new kernel release and the situation today, with 2.6.22, is that hdparm gives maximum throughput 20MB/s for each disk, and for RAID 0 too!

I have been ignoring these performance regressions because of no stability problems until now. So could it be that I'm reaching the 20MB/s driver limit and some requests take too long to be served?


Dimitris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/