I am seeing system hangs with 2.4.17 SMP kernel when doing mke2fs accros 12
drives in parallel. However, the hangs only occur when the I/O rate from
vmstat is high:
bash# vmstat 1
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us sy
id
1 0 0 0 815800 3656 102612 0 0 163 1086 129 961 6 29
65
0 0 0 0 813096 3656 102992 0 0 363 0 265 928 44 26
31
0 0 0 0 813172 3656 102996 0 0 4 0 180 99 0 1
99
5 2 1 0 809476 3656 103044 0 0 44 0 206 274 18 11
71
0 12 0 0 802040 3656 103072 0 0 18 2 296 745 47 33
21
0 8 0 0 800696 3660 103152 0 0 34 244 349 501 10 5
85
0 8 0 0 800696 3660 103152 0 0 0 0 187 106 1 0
99
2 5 1 0 795864 3660 103200 0 0 13 45 278 717 14 29
57
1 0 0 0 795592 3660 103248 0 0 107 46 502 663 7 7
86
1 0 0 0 795592 3660 103248 0 0 0 0 184 95 0 1
99
1 0 0 0 795596 3660 103244 0 0 4 1 191 139 0 1
99
6 5 3 0 756932 3660 140192 0 0 194 36721 464 232 7 87
6
11 0 1 0 723276 3660 173784 0 0 0 33718 681 1560 0 39
61
At that point the system hangs. The system consists of a 4-port and a 8-port
3ware controllers on an Intel 21154 bridge with 12 maxtor drives. When the
IO rate is lower:
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us sy
id
0 0 0 0 811888 3828 103476 0 0 0 0 172 91 0 1
99
0 0 0 0 811888 3828 103476 0 0 0 0 184 93 0 1
99
0 0 0 0 811888 3828 103476 0 0 0 0 171 153 0 1
99
0 0 0 0 811852 3828 103512 0 0 0 170 171 85 0 1
99
0 0 0 0 811852 3828 103512 0 0 0 0 174 95 1 0
99
0 0 0 0 811852 3828 103512 0 0 0 0 173 147 0 1
99
0 0 0 0 811852 3828 103512 0 0 0 0 176 98 0 1
99
0 0 0 0 811852 3828 103512 0 0 0 1 175 96 0 1
99
1 0 0 0 811852 3828 103512 0 0 0 0 173 110 1 3
96
1 0 0 0 811704 3828 103516 0 0 0 0 179 145 0 1
99
0 0 0 0 811688 3828 103516 0 0 0 0 194 121 1 1
98
0 0 0 0 811688 3828 103516 0 0 0 19 186 119 1 1
98
0 0 0 0 811688 3828 103516 0 0 0 0 174 149 0 1
99
0 0 0 0 811688 3828 103516 0 0 0 0 172 86 0 1
99
0 0 0 0 811688 3828 103516 0 0 0 0 175 96 1 0
99
0 0 0 0 811688 3828 103516 0 0 0 1 171 149 1 0
99
0 0 0 0 811688 3828 103516 0 0 0 0 173 91 0 1
99
0 0 1 0 811688 3828 103516 0 0 0 0 173 86 1 0
99
0 0 0 0 811688 3828 103516 0 0 0 0 179 154 0 1
99
0 0 0 0 811688 3828 103516 0 0 0 1 174 91 0 1
99
0 0 0 0 811688 3828 103516 0 0 0 0 179 98 1 0
99
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us sy
id
1 0 0 0 811688 3828 103516 0 0 0 0 174 100 0 1
99
0 0 0 0 811688 3828 103516 0 0 0 0 184 137 0 1
99
0 0 0 0 811688 3828 103516 0 0 0 1 171 89 1 0
99
0 0 0 0 811688 3828 103516 0 0 0 0 173 95 1 0
99
0 0 0 0 811688 3828 103516 0 0 0 0 176 150 0 1
99
0 0 0 0 811688 3828 103516 0 0 0 0 175 121 3 0
97
0 0 0 0 811688 3828 103516 0 0 0 15 171 116 3 0
97
0 0 0 0 811688 3828 103516 0 0 0 0 179 149 0 1
99
0 0 0 0 811688 3828 103516 0 0 0 0 173 88 1 0
99
0 0 0 0 811688 3828 103516 0 0 0 0 171 88 0 1
99
0 0 0 0 811688 3828 103516 0 0 0 15 172 149 1 0
99
0 0 0 0 811688 3828 103516 0 0 0 0 171 88 1 0
99
0 0 0 0 811688 3828 103516 0 0 0 0 174 98 0 1
99
1 0 0 0 811688 3828 103516 0 0 0 0 178 127 0 1
99
0 0 0 0 811688 3828 103516 0 0 0 1 179 153 1 0
99
0 0 0 0 811688 3828 103516 0 0 0 0 171 88 1 0
99
0 0 0 0 811688 3828 103516 0 0 0 0 173 88 0 1
99
0 0 0 0 811688 3828 103516 0 0 0 0 175 158 1 0
99
0 0 0 0 811688 3828 103516 0 0 0 1 175 96 1 0
99
0 0 0 0 811688 3828 103516 0 0 0 0 171 90 0 1
99
0 0 0 0 811924 3828 103516 0 0 0 0 173 204 1 5
94
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us sy
id
1 0 0 0 811924 3828 103516 0 0 0 0 174 90 1 0
99
0 0 0 0 811924 3828 103516 0 0 0 24 182 93 1 0
99
1 0 0 0 811924 3828 103516 0 0 0 0 173 142 0 1
99
0 0 0 0 811924 3828 103516 0 0 0 0 171 93 1 0
99
0 0 0 0 811924 3828 103516 0 0 0 0 175 94 0 1
99
1 0 0 0 811924 3828 103516 0 0 0 2 173 92 1 0
99
0 0 0 0 811924 3828 103516 0 0 0 0 175 151 1 0
99
0 0 0 0 811924 3828 103516 0 0 0 0 173 87 1 0
99
0 0 1 0 811924 3828 103516 0 0 0 0 173 89 1 0
99
0 0 0 0 811924 3828 103516 0 0 0 1 171 142 0 1
99
bash# vmstat 1
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us sy
id
5 0 1 0 815896 3656 102228 0 0 156 1076 129 917 6 34
60
2 0 1 0 812860 3656 102992 0 0 737 0 252 1015 36 40
25
0 0 0 0 813104 3656 102992 0 0 0 0 187 129 2 1
97
0 0 0 0 813180 3656 102996 0 0 4 0 168 93 0 1
99
6 8 1 0 802392 3656 103220 0 0 222 0 251 757 62 37
1
0 10 0 0 804588 3684 103252 0 0 133 40 214 871 50 43
8
2 8 1 0 804324 3712 103272 0 0 162 40 222 817 43 40
18
9 4 0 0 805400 3712 103288 0 0 4 0 196 276 13 8
79
9 0 1 0 804724 3812 103348 0 0 644 96 297 1255 41 59
0
14 0 1 0 804144 3816 103344 0 0 0 0 167 888 56 44
0
10 0 1 0 804448 3820 103340 0 0 0 0 171 873 57 43
0
0 1 0 0 812288 3828 103436 0 0 97 0 222 1051 53 42
5
0 0 0 0 811868 3828 103476 0 0 84 0 395 429 0 3
97
1 0 0 0 811868 3828 103476 0 0 0 0 167 91 0 1
99
1 0 0 0 811868 3828 103476 0 0 0 0 167 141 1 0
99
0 0 0 0 811868 3828 103476 0 0 0 0 177 100 1 0
99
0 0 0 0 811868 3828 103476 0 0 0 0 171 96 0 1
99
1 0 0 0 811868 3828 103476 0 0 0 0 171 132 1 0
99
0 0 0 0 811868 3828 103476 0 0 0 0 170 100 1 0
99
0 0 0 0 811868 3828 103476 0 0 0 0 167 90 1 0
99
0 0 0 0 811868 3828 103476 0 0 0 0 171 93 1 0
99
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us sy
id
0 0 0 0 811868 3828 103476 0 0 0 0 170 148 1 0
99
0 0 0 0 811868 3828 103476 0 0 0 0 176 83 0 1
99
0 0 0 0 811868 3828 103476 0 0 0 0 167 86 0 1
99
0 0 0 0 811868 3828 103476 0 0 0 0 169 146 0 1
99
0 0 0 0 811832 3828 103512 0 0 0 173 168 84 1 0
99
0 0 0 0 811832 3828 103512 0 0 0 0 178 104 1 0
99
0 0 0 0 811832 3828 103512 0 0 0 0 167 141 0 1
99
0 0 0 0 811832 3828 103512 0 0 0 0 173 92 1 2
97
0 0 0 0 811832 3828 103512 0 0 0 1 169 89 0 2
98
1 0 0 0 811832 3828 103512 0 0 0 0 173 138 2 3
95
1 0 0 0 811684 3828 103516 0 0 0 0 183 132 0 1
99
0 0 0 0 811668 3828 103516 0 0 0 0 178 103 0 1
99
0 0 0 0 811668 3828 103516 0 0 0 19 180 108 1 0
99
0 0 0 0 811668 3828 103516 0 0 0 0 169 148 1 0
99
0 0 0 0 811668 3828 103516 0 0 0 0 169 88 0 1
99
0 0 0 0 811668 3828 103516 0 0 0 0 171 94 1 0
99
0 0 0 0 811668 3828 103516 0 0 0 1 170 147 1 0
99
0 0 0 0 811668 3828 103516 0 0 0 0 168 94 0 1
99
0 0 0 0 811668 3828 103516 0 0 0 0 171 95 1 0
99
0 0 0 0 811668 3828 103516 0 0 0 0 173 175 3 0
97
0 0 0 0 811668 3828 103516 0 0 0 15 169 89 1 0
99
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us sy
id
0 0 0 0 811668 3828 103516 0 0 0 0 168 87 1 0
99
1 0 0 0 811668 3828 103516 0 0 0 0 178 156 2 1
97
0 0 0 0 811668 3828 103516 0 0 0 0 175 122 0 1
99
0 0 0 0 811668 3828 103516 0 0 0 15 167 86 1 0
99
0 0 0 0 811668 3828 103516 0 0 0 0 171 92 0 1
99
0 0 0 0 811668 3828 103516 0 0 0 0 177 151 0 1
99
0 0 0 0 811668 3828 103516 0 0 0 0 169 93 0 1
99
0 0 0 0 811668 3828 103516 0 0 0 1 169 86 0 1
99
0 0 0 0 811668 3828 103516 0 0 0 0 167 146 0 1
99
0 0 0 0 811668 3828 103516 0 0 0 0 172 95 0 1
99
0 0 0 0 811668 3828 103516 0 0 0 0 167 84 1 0
99
0 0 0 0 811668 3828 103516 0 0 0 1 178 189 1 0
99
0 0 0 0 811668 3828 103516 0 0 0 0 171 89 1 0
99
0 0 0 0 811668 3828 103516 0 0 0 0 171 89 0 1
99
1 0 0 0 811668 3828 103516 0 0 0 0 171 124 1 0
99
0 0 0 0 811668 3828 103516 0 0 0 1 183 127 0 1
99
bash# df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/ram1 125011 85304 33154 73% /
coreserver:/var/cores
74858752 475144 70580928 1%
/var/.automount/cores
/dev/sdj1 7827172 24 7429540 1% /disks/disk10.1
/dev/sdi1 7827172 24 7429540 1% /disks/disk9.1
/dev/sdl1 7827172 24 7429540 1% /disks/disk12.1
/dev/sdk1 7827172 24 7429540 1% /disks/disk11.1
/dev/sdh1 7827172 24 7429540 1% /disks/disk8.1
/dev/sdf1 7827172 24 7429540 1% /disks/disk6.1
/dev/sde1 7827172 24 7429540 1% /disks/disk5.1
/dev/sdb1 7827172 24 7429540 1% /disks/disk2.1
/dev/sdg1 7827172 24 7429540 1% /disks/disk7.1
/dev/sda1 7827172 24 7429540 1% /disks/disk1.1
/dev/sdd1 7827172 24 7429540 1% /disks/disk4.1
/dev/sdc1 7827172 24 7429540 1% /disks/disk3.1
bash#
bash# vmstat 1
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us sy
id
0 0 0 0 755440 3924 105192 0 0 83 8108 353 655 5 25
69
0 0 1 0 755440 3924 105192 0 0 0 0 167 70 0 0
100
0 0 0 0 755440 3924 105192 0 0 0 24 159 82 0 1
99
0 0 0 0 755052 3924 105192 0 0 0 0 162 133 0 0
100
0 0 0 0 755052 3924 105192 0 0 0 0 156 68 0 1
99
0 0 0 0 755052 3924 105192 0 0 0 0 155 59 0 0
100
0 0 0 0 755052 3924 105192 0 0 0 1 157 124 0 0
100
0 0 0 0 755052 3924 105192 0 0 0 0 174 82 0 1
99
0 0 0 0 755052 3924 105192 0 0 0 0 161 73 0 0
100
1 0 0 0 755052 3924 105192 0 0 0 0 159 101 0 0
100
0 0 0 0 755052 3924 105192 0 0 0 1 155 92 0 0
100
0 0 0 0 755052 3924 105192 0 0 0 0 155 57 0 0
100
0 0 0 0 755052 3924 105192 0 0 0 0 155 67 1 0
99
0 0 0 0 755052 3924 105192 0 0 0 0 155 112 0 0
100
0 0 0 0 754440 3924 105192 0 0 0 6 157 67 0 0
100
0 0 0 0 754440 3924 105192 0 0 0 0 155 62 0 0
100
0 0 0 0 754440 3924 105192 0 0 0 0 157 128 0 0
100
0 0 0 0 754440 3924 105192 0 0 0 0 160 66 0 1
99
0 0 0 0 754440 3924 105192 0 0 0 1 157 72 0 0
100
0 0 0 0 754440 3924 105192 0 0 0 0 155 117 0 0
100
0 0 0 0 754440 3924 105192 0 0 0 0 155 71 0 0
100
procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us sy
id
0 0 0 0 754440 3924 105192 0 0 0 0 157 66 0 0
100
1 0 0 0 754440 3924 105192 0 0 0 1 166 114 0 1
99
0 0 0 0 754440 3924 105192 0 0 0 0 158 93 0 0
100
there are no hangs. On startup, I am doing parallel mje2fs accross all the
drives. 3ware 4-port controller shows that LEDs are ON. I have tried
replacing the controllers but that also does not help ...
Thanks
Manish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sat Nov 23 2002 - 22:00:35 EST