Long uptimes / 2.4.19-pre7 / procps-2.0.14

From: willy tarreau
Date: Mon Aug 25 2003 - 08:36:47 EST


Hello !

one of my customers has a machine that has passed the 497 days uptime barrier
a week ago. It's nearly 507 now. I know there are security issues, but it's not
that much affected from where it is and how it's being used, but I'm planning
an upgrade anyway. Before upgrading, I wanted to check if there were errors
or unexpected behaviour due to such uptimes, and in fact I found a few bugs
in procps ; the original one is 2.0.7, but I upgraded it to 2.0.14.

The 2.0.7 did floating point exceptions in top and vmstat. After upgrade to
2.0.14, top behave correctly, but vmstat sends strange CPU usages :

# vmstat 1
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 32 132612 43700 59212 0 0 6 384 471 470 26 74 0
0 0 0 32 132616 43700 59212 0 0 0 0 264 85 0 0 0
0 0 0 32 132616 43700 59212 0 0 0 0 480 122 0 100 0
1 0 0 32 132616 43700 59212 0 0 0 0 465 133 50 50 0
0 0 0 32 132616 43700 59212 0 0 0 8 380 113 0 0 0
0 0 0 32 132616 43700 59212 0 0 0 0 345 102 0 0 0
0 0 0 32 132616 43700 59212 0 0 0 0 646 193 0 100 0
0 0 0 32 132616 43700 59212 0 0 0 0 316 107 100 0 0
0 0 0 32 132616 43700 59212 0 0 0 0 437 154 0 100 0
0 0 0 32 132616 43700 59212 0 0 0 16 172 50 0 0 0

The machine is really not loaded. It's used as a small HTTP proxy/load-balancer
which receives between 10 and 30 hits/s. Common usages are about 0.1% for the
user and 1% for the system, as reported by top, BTW :

# top
08:34:14 up 506 days, 23:22, 1 user, load average: 0.00, 0.01, 0.00
31 processes: 30 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 0.0% user 0.4% system 0.0% nice 0.0% iowait 99.6% idle
Mem: 255812k av, 123164k used, 132648k free, 0k shrd, 43700k buff
65108k active, 45876k inactive
Swap: 263160k av, 32k used, 263128k free 59108k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
3377 proxy 15 0 1488 1488 520 S 0.3 0.5 122:52 0 haproxy
1 root 8 0 508 508 440 S 0.0 0.1 0:04 0 init-sysv

The kernel used was a home-made 2.4.18-wt4, derived mainly from 2.4.19-pre7,
-aa VM, Andre Hedrick's IDE patches (now in mainline), and preempt patches.

I'm just wondering if it's vmstat reporting wrong values or the kernel (or both).
I can still wait several days before upgrading (or even rebooting), so if there
are any suggested fixes to try, I'm OK to give them their chance, provided that
they only affect user-space in a somewhat non-risky way (ie: no module loading
nor hot-kernel patching, nor glibc upgrades). Idem if some of you have ideas of
areas to look at or values to collect just for further reference.

Here are some more info :
# uptime
7:40am up 506 days, 15:41, 1 user, load average: 0.07, 0.02, 0.00
# cat /proc/uptime
43774924.97 43727774.11
# uname -a
Linux proxy 2.4.18-wt4 #1 Mon Apr 1 14:35:08 CEST 2002 i686 unknown
# glibc version = 2.2.4
# distro = formilux 0.1.7
# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 1252 508 ? S 2002 0:04 init [4]
root 3 0.0 0.0 0 0 ? SW 2002 0:00 [keventd]
root 4 0.0 0.0 0 0 ? SWN 2002 0:24 [ksoftirqd_CPU0]
root 5 0.0 0.0 0 0 ? SW 2002 0:00 [kswapd]
root 6 0.0 0.0 0 0 ? SW 2002 0:00 [bdflush]
root 7 0.0 0.0 0 0 ? SW 2002 0:00 [kupdated]
root 8 0.0 0.0 0 0 ? SW< 2002 0:00 [mdrecoveryd]
root 9 0.0 0.0 0 0 ? SW 2002 0:24 [kjournald]
root 24 0.0 0.0 0 0 ? SW 2002 0:00 [kjournald]
root 25 0.0 0.0 0 0 ? SW 2002 1:18 [kjournald]
root 43 0.0 0.2 1292 572 ? S 2002 0:00 crond
root 143 0.0 0.2 1304 628 ? S 2002 29:42 syslogd
root 145 0.0 0.5 2092 1344 ? S 2002 0:00 klogd
root 166 0.0 0.1 1280 472 ? S 2002 0:00 gpm
root 186 0.0 0.1 1232 444 tty3 S 2002 0:00 /sbin/agetty
root 187 0.0 0.1 1232 444 tty4 S 2002 0:00 /sbin/agetty
root 188 0.0 0.1 1232 444 tty5 S 2002 0:00 /sbin/agetty
root 193 0.0 0.1 1232 444 tty6 S 2002 0:00 /sbin/agetty
root 194 0.0 0.1 1232 444 tty7 S 2002 0:00 /sbin/agetty
root 1460 0.0 0.1 1232 444 tty2 S 2002 0:00 /sbin/agetty
root 2966 0.0 0.6 1648 1640 ? SL 2002 0:00 /usr/sbin/ntpd
proxy 3377 0.1 0.5 2228 1488 ? S Jun03 122:46 /usr/sbin/haproxy
root 19487 0.0 0.1 1232 444 tty1 S Aug13 0:00 /sbin/agetty
root 12881 0.0 0.1 1232 444 ttyS0 S Aug15 0:00 /sbin/agetty
root 5120 0.0 0.4 2324 1216 pts/0 R Aug15 0:00 -zsh
root 5125 0.0 0.2 2468 740 pts/0 R Aug15 0:00 ps aux

Interestingly, there was a warning (and only one) in the system logs about
the clock :
Jul 1 01:59:59 proxy kernel: Clock: inserting leap second 23:59:60 UTC

BTW, the hardware is a Compaq tasksmart / P3-933 / scsi / Serverworks chipset /
eepro100 module.

Cheers,
Willy


___________________________________________________________
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
Yahoo! Mail : http://fr.mail.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/