Re: 2.6.0-test8/test9 io scheduler needs tuning?

From: Martin Josefsson
Date: Tue Oct 28 2003 - 12:43:49 EST


On Tue, 2003-10-28 at 05:30, Nick Piggin wrote:

> The bi / bo fields in vmstat aren't exactly what the disk is doing, rather
> the requests the queue is taking. The interesting thing is how your final
> measurement compares with 2.4.

I've attached a small script I made to investigate some weird I/O
problems I'm seeing with both AS and deadline. Maybe the script can be
of some help, who knows.

It produces vmstat-like one line per second output.
It takes the data from /proc/diskstats (output should be the same order
as in that file, it's described in Documentation/iostats.txt)

Example output (from one of my tests that showed the problem):

$diskstat.sh hde
row reads rmerge rsect rms writes wmerge wsect wms io ms ms2
1 0 0 0 0 0 0 0 0 163 0 0
2 3 0 24 46 751 530 10032 241626 135 1463 223510
3 1 0 64 15 590 402 8056 147398 149 1019 148466
4 1 0 160 16 599 593 9592 144204 156 1016 147806
5 1 0 8 21 522 485 7896 134072 133 1016 152418
6 1 0 8 14 541 265 6504 163212 139 1014 149101
7 3 0 264 63 495 153 5248 127999 147 1016 145554
.....

It's a sequential write, only one writer, lots of I/O-requests for a
very small number of sectors and the drive isn't very active.
It's a file recieved via network and written to disk.

Corresponding 'vmstat 1' output (not at the same time as the output
above):

24 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
25 r b swpd free buff cache si so bi bo in cs us sy id wa
26 2 5 20400 2256 42956 297704 0 0 12 2300 4452 6010 4 25 0 71
27 3 4 20544 2524 42536 298276 0 144 16 3476 3605 4366 3 21 0 76
28 0 10 20544 2104 41876 299380 0 0 16 2996 4094 5107 4 22 0 74
29 1 10 20548 2456 40656 300252 0 4 40 3228 4389 5725 4 27 0 69
30 0 5 20832 2176 40108 301252 0 288 16 2780 4530 6889 5 26 0 69

Normally I see bursts of >30MB/s with a few seconds interval.
In this example there's some swapping but I havn't seen any indication
that swapping is the cause since it happens when there's no swapping
going on as well.

When this happens (not all the time) the machine freezes for 1-2 seconds
every 3-15 seconds. X freezes and if I ssh into the machine that freezes
as well at these intervals.

I've tried profiling when this happens but it just shows all time in
default_idle.

Maybe my problem is related, or maybe not...

--
/Martin
#/bin/bash

num=0
for loop in a b c d e f g h i j k
do
old[$num]=0
num=$(($num + 1))
done

row=0

while true; do grep "$1 " /proc/diskstats | awk '{print $4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14}'; sleep 1; done | while read a b c d e f g h i j k
do
if [ $(($row % 30)) == 0 ]; then
#echo -e "row\treads\trmerge\trsect\trms\t\twrites\twmerge\twsect\twms\t\tio\tms\tms2"
echo -e "row\treads\trmerge\trsect\trms\twrites\twmerge\twsect\twms\tio\tms\tms2"
fi
row=$(($row + 1))

num=0
for loop in a b c d e f g h i j k
do
if [ ${old[$num]} == 0 ]; then
old[$num]=${!loop}
diff[$num]=0
else
diff[$num]=$((${!loop} - ${old[$num]}))
old[$num]=${!loop}
fi
num=$(($num + 1))
done

echo -e "$row\t${diff[0]}\t${diff[1]}\t${diff[2]}\t${diff[3]}\t${diff[4]}\t${diff[5]}\t${diff[6]}\t${diff[7]}\t$i\t${diff[9]}\t${diff[10]}"
done

Attachment: signature.asc
Description: This is a digitally signed message part