Re: Is there something wrong here?

Linus Torvalds (torvalds@transmeta.com)
16 Jan 1999 08:35:36 GMT


In article <Pine.LNX.4.05.9901150916210.30354-100000@peace.netnation.com>,
Simon Kirby <sim@netnation.com> wrote:
>There has always been something that has felt wrong for me with regards to
>how Linux blocks processes when something is being written to disk...I'm
>going to attempt to describe it with an example I just saw of it on our
>mail server.

Good. This sounds like a classic.

> 2 0 0 432 4304 157104 211216 0 0 0 0 256 220 1 4 95
> 1 0 0 432 4520 157220 211216 0 0 0 0 307 305 6 4 90
> 0 1 0 432 3824 157384 211244 0 0 10 0 401 446 10 5 85
><- bdflush is called by update() here, I think, or somebody ran sync ->
> 0 1 0 432 3408 157560 211240 0 0 28 1391 735 963 15 17 67
> 0 5 0 432 1356 155940 212592 0 0 1 998 431 1274 24 11 65
> 0 6 0 432 1216 155820 212588 0 0 0 1064 436 596 9 4 87
> 0 6 0 432 1340 155692 212588 0 0 0 2029 362 494 1 3 96
> 0 10 0 432 1496 154936 212600 0 0 0 1023 547 794 12 6 83
> 0 15 0 432 1228 153904 212492 0 0 1 953 469 617 23 9 68
> 0 19 0 432 1424 153236 212476 0 0 0 1404 424 654 5 3 92
> 1 14 0 432 1512 153048 212436 0 0 4 1613 445 636 2 5 93

Ok, the thing I'd really like to hear is that when you see behaviour
like this, could you please do a "ps axl" while you have the bad
behaviour going on (ie when you have multiple processes in disk wait).

I'd like to see what the wchan is for the processes in question - are
they waiting on the page cache, on buffers, or on inodes.

Basically, I _suspect_ that what is going on is somebody waiting on a IO
entity just because the IO entity is locked - never mind the fact that
it is perfectly up-to-date, and it is only locked for IO because it is
busy getting written out.

I've made the same mistake several times, and I suspect others have too,
and before I start looking into where it is I'd _really_ like to have it
narrowed down a bit.

I agree that we shouldn't see more processes in the D state just because
somebody else is writing out buffers or inodes somewhere. But at the
same time I wouldn't be at all surprised if we had some silly locking
going on (the superblock, for example, could easily badly serialize
accesses unnecessarily - we saw another case where superblock locking
didn't just result in bad performance, but in actual deadlock
situations).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/