2.6.10-rc1-mm4 -1 EAGAIN after allocation failure was: Re: Kernel 2.6.9 Multiple Page Allocation Failures

From: Stefan Schmidt
Date: Tue Nov 09 2004 - 20:28:54 EST


Alright, i got a funny thing here that i suspect to be an(other?) vm issue:

We are running a third-party closed source software which handles many tcp
sessions and reads and writes that to/from several disks/partitions.
With 2.6.10-rc1-mm4 it is the first time we notice that, right after the kernel
throws a swapper: page allocation error thread (just like the ones you already
know), the interrupt rate, connection count and traffic decreases subsequently.

Here is part of a vmstat 10:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 11312 19404 268 1896896 0 0 1091 17578 25463 1225 7 38 37 18
0 0 11312 26372 180 1892836 0 0 1182 21433 25576 1216 7 38 31 24
1 2 11308 23784 608 1890168 0 0 1252 20667 25532 1243 7 40 24 29
0 2 11308 23304 428 1890552 0 0 1174 20363 25948 1332 7 40 32 21
1 1 11304 18496 444 1893328 0 0 1630 20506 25840 1322 7 38 30 26
1 1 11304 8712 232 1905508 0 0 1528 19662 26245 1305 7 40 25 28
1 0 11304 18952 180 1894000 0 0 1595 19680 26275 1215 7 38 27 28
1 0 11304 22404 132 1896632 0 0 369 17724 24072 1045 7 37 49 7
1 0 11304 23956 492 1899876 0 0 504 19609 20829 1151 9 34 49 7
1 0 11304 25380 460 1908340 0 0 424 17983 16964 927 9 28 55 8
1 0 11304 18244 464 1922140 0 0 309 14431 13417 836 10 27 60 3
0 0 11304 17720 472 1928388 0 0 224 11868 9933 607 11 23 63 3
1 0 11304 25720 476 1924440 0 0 133 7663 6780 504 10 20 68 2
1 0 11304 24156 488 1928168 0 0 107 6244 5011 315 8 18 73 1
0 0 11304 19544 712 1934268 0 0 76 3191 4464 299 8 18 73 1
0 0 11304 19248 728 1936564 0 0 23 1802 4002 249 7 17 76 0
1 0 11304 27092 736 1929892 0 0 16 1336 3655 284 6 16 78 0
0 0 11304 26472 752 1931984 0 0 19 1508 3408 248 5 16 78 1
0 0 11304 19000 768 1940944 0 0 20 1398 3195 226 5 14 81 0
1 0 11304 21460 776 1938896 0 0 14 1084 3057 241 5 14 82 0
0 0 11304 26268 848 1934608 0 0 12 927 2906 218 5 13 82 0
1 1 11304 22076 900 1939860 0 0 18 679 2897 215 5 11 84 1
0 0 11304 25880 952 1936748 0 0 17 653 2713 251 4 13 82 1
0 0 11304 20436 976 1942368 0 0 8 1117 2703 229 5 11 83 1
...

strace shows:
01:38:50.316041 gettimeofday({1100047130, 316054}, NULL) = 0
01:38:50.316188 poll([{fd=5671, events=POLLIN}, {fd=2727, events=POLLIN}, {fd=6663, events=POLLIN}, {fd=197, events=POLLIN}, {fd=3978, events=POLLIN}, {fd=779, events=POLLIN}, ...{line continues like this}...
...
01:38:50.328056 accept(5, 0xbffd4ab8, [16]) = -1 EAGAIN (Resource temporarily unavailable) ...{an awful lot of these}...
...
01:38:50.329585 futex(0xaf1a698, FUTEX_WAIT, 92828, {0, 9964000}) = -1 ETIMEDOUT (Connection timed out) ...{some of these}...
...
Application says:
"n.n.n.n:p Client closed connection in body read"

To me it seems like suddently all those open sockets are suddenly 'temporarily
unavailable' to the application and so the connections time out.
I have not (yet?) seen this behaviour on 2.6.9, 2.6.9-mm1, 2.6.10-rc1-bk12 or
2.6.10-rc1-mm3.
I am able to reproduce the behaviour if under the same load iptraf or
tethereal are started. (First thought it might be because of the promisc mode.)

This seems to be what _might_ have triggered this although it was logged
happened 5m earlier than the traffic decay:

printk: 36 messages suppressed.
swapper: page allocation failure. order:0, mode:0x20
[__alloc_pages+525/912] __alloc_pages+0x20d/0x390
[__get_free_pages+24/48] __get_free_pages+0x18/0x30
[kmem_getpages+24/192] kmem_getpages+0x18/0xc0
[cache_grow+157/304] cache_grow+0x9d/0x130
[cache_alloc_refill+380/576] cache_alloc_refill+0x17c/0x240
[__kmalloc+122/144] __kmalloc+0x7a/0x90
[alloc_skb+50/208] alloc_skb+0x32/0xd0
[tg3_alloc_rx_skb+112/304] tg3_alloc_rx_skb+0x70/0x130
[tg3_rx+518/944] tg3_rx+0x206/0x3b0
[tg3_poll+139/336] tg3_poll+0x8b/0x150
[net_rx_action+125/288] net_rx_action+0x7d/0x120
[__do_softirq+184/208] __do_softirq+0xb8/0xd0
[do_softirq+45/48] do_softirq+0x2d/0x30
[do_IRQ+30/48] do_IRQ+0x1e/0x30
[common_interrupt+26/32] common_interrupt+0x1a/0x20
[default_idle+0/64] default_idle+0x0/0x40
[default_idle+44/64] default_idle+0x2c/0x40
[cpu_idle+51/64] cpu_idle+0x33/0x40
[start_kernel+331/368] start_kernel+0x14b/0x170
[unknown_bootoption+0/432] unknown_bootoption+0x0/0x1b0
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16

Free pages: 4616kB (1600kB HighMem)
Active:504159 inactive:454759 dirty:20020 writeback:115 unstable:0 free:1154 slab:50758 mapped:489095 pagetables:1222
DMA free:56kB min:144kB low:288kB high:432kB active:1936kB inactive:4932kB present:16384kB pages_scanned:32 all_unreclaimable? no
protections[]: 0 0 0
Normal free:2960kB min:8044kB low:16088kB high:24132kB active:492320kB inactive:166992kB present:901120kB pages_scanned:62 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:1600kB min:512kB low:1024kB high:1536kB active:1522380kB inactive:1647112kB present:3178432kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 0*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 56kB
Normal: 0*4kB 0*8kB 1*16kB 0*32kB 2*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2960kB
HighMem: 6*4kB 3*8kB 41*16kB 0*32kB 6*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1600kB
Swap cache: add 154147, delete 151810, find 29532/39794, race 0+0

Stefan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/