Is there something wrong here?

Simon Kirby (sim@netnation.com)
Fri, 15 Jan 1999 09:33:19 -0800 (PST)


There has always been something that has felt wrong for me with regards to
how Linux blocks processes when something is being written to disk...I'm
going to attempt to describe it with an example I just saw of it on our
mail server.

We have a mail server here with many thousand POP accounts, so we get
quite a few logins per second. It's currently running on a Dual P2
machine with 2.2.0pre7a2, but I've noticed this forever (UP and SMP).
So, as people login, the pop daemon is going to load up their mailbox from
the mail spool (currently on one device but planned to move to RAID).
The box has 512MB ram, so it's pretty good at caching things, but
sometimes weird things like this happen just as bdflush runs:

(vmstat 1):

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 432 5188 155952 211232 0 0 0 0 255 290 9 3 88
0 0 0 432 5448 156012 211232 0 0 0 0 288 236 8 4 88
2 0 0 432 4756 156092 211232 0 0 0 0 276 270 12 6 83
1 0 0 432 2832 156212 211224 0 0 0 0 602 567 34 7 59
0 4 3 432 1644 156232 211212 0 0 0 412 773 921 20 8 72
0 0 0 432 4552 156428 211208 0 0 0 88 543 553 13 8 79
0 0 0 432 4356 156624 211208 0 0 0 0 403 422 6 3 90
0 0 0 432 4068 156832 211212 0 0 0 0 423 469 5 6 89
1 0 0 432 3656 156944 211220 0 0 17 0 493 603 13 8 78
0 0 0 432 3996 157008 211224 0 0 0 0 282 266 8 6 86
2 0 0 432 4304 157104 211216 0 0 0 0 256 220 1 4 95
1 0 0 432 4520 157220 211216 0 0 0 0 307 305 6 4 90
0 1 0 432 3824 157384 211244 0 0 10 0 401 446 10 5 85
<- bdflush is called by update() here, I think, or somebody ran sync ->
0 1 0 432 3408 157560 211240 0 0 28 1391 735 963 15 17 67
0 5 0 432 1356 155940 212592 0 0 1 998 431 1274 24 11 65
0 6 0 432 1216 155820 212588 0 0 0 1064 436 596 9 4 87
0 6 0 432 1340 155692 212588 0 0 0 2029 362 494 1 3 96
0 10 0 432 1496 154936 212600 0 0 0 1023 547 794 12 6 83
0 15 0 432 1228 153904 212492 0 0 1 953 469 617 23 9 68
0 19 0 432 1424 153236 212476 0 0 0 1404 424 654 5 3 92
1 14 0 432 1512 153048 212436 0 0 4 1613 445 636 2 5 93
0 0 0 432 6184 154648 210864 0 0 5 530 521 604 7 14 80
0 0 0 432 6556 154676 210864 0 0 0 0 287 342 10 7 84
0 0 0 432 6536 154696 210856 0 0 1 0 388 447 13 10 77
0 0 0 432 6804 154724 210856 0 0 0 0 230 204 10 5 85
0 0 0 432 6708 154728 210848 0 0 0 0 182 141 1 3 96
0 0 0 432 6692 154744 210848 0 0 0 0 156 80 0 3 97
0 0 0 432 6612 154744 210848 0 0 0 0 180 108 1 4 95
4 0 0 432 6144 154748 210848 0 0 0 0 207 182 14 6 80

Note that as soon as it starts flushing, a ton of processes start to get
blocked...and it seems to look almost like they're ones that in the past
have had no problems getting their mailbox from dcache. Maybe not,
though. Or maybe it only starts to block after one is waiting for I/O
even though the next few can grab from dcache or something...

Okay, here I waited for things to be idle for a bit, made a few largish
files, and typed "sync" while vmstat 1 is running in another console:

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 432 64492 160088 162560 0 0 0 0 223 149 2 5 92
1 0 0 432 64472 160088 162372 0 0 1 0 197 167 5 2 93
<- ran sync here ->
0 1 0 432 63908 160100 162392 0 0 21 4159 347 433 15 7 78
0 1 0 432 63884 160100 162392 0 0 0 4557 336 429 3 6 91
0 2 0 432 64428 160116 162408 0 0 13 5963 399 509 11 10 79
2 2 0 432 59512 160132 162416 0 0 4 3776 411 527 33 10 57
3 2 0 432 58272 160296 162452 0 0 172 2810 521 929 27 13 60
0 4 0 432 56116 160344 162452 0 0 35 2106 433 740 31 8 61
1 4 0 432 57820 160348 162452 0 0 0 2621 434 644 28 12 60
0 6 0 432 54536 160356 162784 0 0 8 5428 461 647 34 9 57
2 8 0 432 56636 160356 162784 0 0 0 3302 316 385 2 4 94
0 10 0 432 56288 160356 162784 0 0 0 3099 410 566 2 5 93
2 13 0 432 55712 160356 162764 0 0 1 3954 390 525 9 4 87
0 19 0 432 54420 160360 162764 0 0 3 2435 378 507 20 4 76
0 20 0 432 54296 160360 162764 0 0 0 3845 385 663 17 9 74
1 20 0 432 54128 160360 162764 0 0 1 2279 308 383 3 3 94
4 2 0 432 55640 160372 162752 0 0 2 4887 628 949 16 14 70
2 2 0 432 56584 160448 162756 0 0 50 99 491 519 27 16 57
4 3 0 432 57928 160532 162760 0 0 69 237 501 597 11 15 74
0 0 0 432 59244 160540 162760 0 0 0 0 218 210 9 17 74
0 0 0 432 59440 160540 162760 0 0 1 0 182 131 3 4 93
1 0 0 432 59508 160540 162760 0 0 0 0 231 199 2 4 94
1 0 0 432 60008 160584 162740 0 0 1 0 217 203 11 2 87
0 0 0 432 60596 160584 162740 0 0 1 0 264 337 4 6 89

Anyway, wrapping everything with an LD_PRELOAD library that nopifies
sync() and fsync() helps, but bdflush() called by update still seems to
take a big dump to the disk instead of spreading it out a bit, so it still
ends up doing massive process blocks every few minutes.

Has anybody else witnessed this and thought it was strange? It might just
be that it's throwing out some page cache that it needs for all of the
logins that come in and it needs to reread it while the disk is busy
writing out stuff, but it doesn't seem right to me. *shrug*

Simon-

| Simon Kirby | Systems Administration |
| mailto:sim@netnation.com | NetNation Communications |
| http://www.netnation.com/ | Tech: (604) 684-6892 |

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/