Re: patch for 2.1.54 fs/inode.c -- please test! (oops)

Michael L. Galbraith (mikeg@weiden.de)
Fri, 12 Sep 1997 07:29:20 +0200 (MET DST)


On Tue, 9 Sep 1997, Bill Hawes wrote:

> I've attached a patch that provides consistent inode cleanup and
> improves inode memory management, and would like to ask for people to
> give it a test. It seems to be quite effective at balancing memory
> committed to inodes (and indirectly to the dcache.)
>
> I've left a few debugging prinks in the patch, but haven't enabled the
> conditionals. Let me know how it works out.
>

Hello Bill,

I got a chance to do some testing yesterday. Interactive performance under
load is _much_ nicer with your patch. Without the patch, updatedb with an
ls -laR / and a find /|wc -l brought the machine to it's knees even with
80M ram enabled. Processes which were starving for time without the patch
(under heavy load) are now running _very_ sloothly. Nice patch sir.

The rest of this may not be related to your patch at all.

But.. three very long buts (oopses). I didn't see these in 55 without your
patch though they might have popped up with more testing. I'm ccing Ted
because of all the vt references in the oopses.

While testing, an oops started to pop onto the screen just as I was SysRqing
around looking at mem usage and such. The oops which started to come out
was replaced by a deadly oops and complete freeze (ktracer no help). See
oops #1.. copied from frozen screen. The last thing to make it to syslog
was a scsi data overrun/reset (aha-2940). Perhaps SysRqing in the middle
of a scsi reset doesn't do good things.

I recompiled with gcc-2.7.2.2 to make sure it wasn't egcs-970907(+pgcc) which
caused the oops. Started testing as before, and received oops #2. This looks
like the oops which started to happen when oops #1 froze the machine.

I then recompiled with egcs again without ktracer enabled, and received oops #3.
It's the same as oops #2 without the additional ktrace info.. don't think it
was experimental compiler induced.

-Mike

Sep 11 07:24:26 mikeg kernel: got one
Sep 11 07:25:27 mikeg last message repeated 283 times
... more deleted
Sep 11 07:36:22 mikeg last message repeated 88 times
Sep 11 07:36:25 mikeg kernel: (scsi0:4:0) Data overrun of 243715 bytes detected in Data-In phase, tag 9; forcing a retry.

Unable to handle kernel paging request at virtual address ebf88c08
current->tss.cr3 = 04fba000, `r3 = 04fba000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c012817d>]
EFLAGS: 00010203
eax: 8bc02ae0 ebx: 00000b3d ecx: 00000023 edx: ebf00bf0
esi: 00000000 edi: 00000bc3 ebp: c009be8c esp: c009be68
ds: 0018 es: 0018 ss: 0018
Process init (pid: 1, process nr: 1, stackpage=c009b000)
Stack: c0235fcc 00000312 00000807 00000000 00000000 00000000 00000000 00000001
00000000 c009beb0 c010fd64 c0233e04 0000000f 0000006d c0236000 00000807
0000037d 00005000 c009bec8 c018d4c7 c01cfb75 00000032 00000000 00000001
Call Trace: [<c010fd64>(0)] [<c018d4c7>(36)] [<c018c0af>(24)] [<c018d1e4>(60)]
[<c010b218>(16)] [<c0109f00>(44)] [<c0126464>(92)] [<c012649c>(20)]
[<c0109e08>(12)] [<ffffffff>] [<c0110f8b>] [<c0187fda>] [<c018ebfe>]
[<c01846d1>] [<c018805f>] [<c01bff03>] [<c01bff03>] [<c01bfebe>]
[<c01c00b3>] [<c01138bf>] [<c0110fed>] [<c0187fda>] [<c018ebfe>]
[<c01846d1>] [<c018805f>] [<c01bff03>] [<c01bff03>] [<c01bfebe>]
[<c01c00b3>] [<c01138bf>] [<c0110f8b>] [<c0187fda>] [<c018ebfe>]
[<c01846d1>] [<c018805f>] [<c0187fda>] [<c018ebfe>] [<c0185eea>]
[<c018fc01>] [<c018eb42>]
Code: 8b 42 18 a8 04 74 03 ff 45 fc 8b 42 18 a8 40 74 03 ff 45 f8
Using `/boot/2.1.55/System.map' to map addresses to symbols.

>>EIP: c012817d <show_buffers+79/f0>
Trace: c010fd64 <show_mem+e8/f8>
Trace: c018d4c7 <handle_sysrq+12b/198>
Trace: c018c0af <handle_scancode+153/2b0>
Trace: c018d1e4 <keyboard_interrupt+48/64>
Trace: c010b218 <do_IRQ+84/108>
Trace: c0109f00 <ret_from_intr>
Trace: c0126464 <fsync_dev+1c/40>
Trace: c012649c <sys_sync+14/1c>
Trace: c0109e08 <system_call+48/50>
Trace: ffffffff
Trace: c0110f8b <wake_up_interruptible+13/98>
Trace: c0187fda <poke_blanked_console+e/80>
Trace: c018ebfe <set_cursor+12/c0>
Trace: c01846d1 <vc_cons_allocated+d/2c>
Trace: c018805f <vt_console_print+13/1b4>
Trace: c01bff03 <number+13/1b0>
Trace: c01bff03 <number+13/1b0>
Trace: c01bfebe <skip_atoi+e/40>
Trace: c01c00b3 <vsprintf+13/340>
Trace: c01138bf <printk+13/174>
Trace: c0110fed <wake_up_interruptible+75/98>
Trace: c0187fda <poke_blanked_console+e/80>
Trace: c018ebfe <set_cursor+12/c0>
Trace: c01846d1 <vc_cons_allocated+d/2c>
Trace: c018805f <vt_console_print+13/1b4>
Trace: c01bff03 <number+13/1b0>
Trace: c01bff03 <number+13/1b0>
Trace: c01bfebe <skip_atoi+e/40>
Trace: c01c00b3 <vsprintf+13/340>
Trace: c01138bf <printk+13/174>
Trace: c0110f8b <wake_up_interruptible+13/98>
Trace: c0187fda <poke_blanked_console+e/80>
Trace: c018ebfe <set_cursor+12/c0>
Trace: c01846d1 <vc_cons_allocated+d/2c>
Trace: c018805f <vt_console_print+13/1b4>
Trace: c0187fda <poke_blanked_console+e/80>
Trace: c018ebfe <set_cursor+12/c0>
Trace: c0185eea <complement_pos+e/b0>
Trace: c018fc01 <clear_selection+d/54>
Trace: c018eb42 <__set_origin+12/60>

Code: c012817d <show_buffers+79/f0>
Code: c012817d <show_buffers+79/f0> 8b 42 18 movl 0x18(%edx),%eax
Code: c0128180 <show_buffers+7c/f0> a8 04 testb $0x4,%al
Code: c0128182 <show_buffers+7e/f0> 74 03 je c0128187 <show_buffers+83/f0>
Code: c0128184 <show_buffers+80/f0> ff 45 fc incl 0xfffffffc(%ebp)
Code: c012818d <show_buffers+89/f0> 8b 42 18 movl 0x18(%edx),%eax
Code: c0128190 <show_buffers+8c/f0> a8 40 testb $0x40,%al
Code: c0128192 <show_buffers+8e/f0> 74 03 je c0128191 <show_buffers+8d/f0>
Code: c0128194 <show_buffers+90/f0> ff 45 f8 incl 0xfffffff8(%ebp)
Code: c012819d <show_buffers+99/f0>

[gcc-2.7.2.2]
Unable to handle kernel NULL pointer dereference at virtual address 00000bfc
current->tss.cr3 = 02705000, `r3 = 02705000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0143e2d>]
EFLAGS: 00010217
eax: 00000000 ebx: 00000bfc ecx: bfffea1b edx: 000002ff
esi: bfffea1b edi: 00000000 ebp: c4073f10 esp: c4073f0c
ds: 0018 es: 0018 ss: 0018
Process top (pid: 1538, process nr: 59, stackpage=c4073000)
Stack: 00000000 c4073f30 c0143eba c3bb2000 bfffea1b 00000216 c332d000 c332d000
ffffffea c4073f50 c0143fcb c3bb2000 bfffea1b bfffeaf4 c332d000 00000777
c4f499e0 c4073f60 c0144cc3 00000777 c332d000 c4073f98 c0144d8c c332d000
Call Trace: [<c0143eba>(0)] [<c0143fcb>(32)] [<c0144cc3>(32)] [<c0144d8c>(16)]
[<c0125871>(56)] [<c0109e18>(36)] [<ffffffff>] [<c0110fb0>]
[<c0188011>] [<c018ec0d>] [<c01844ed>] [<c0188093>] [<c01bfc2f>]
[<c01bfc2f>] [<c01bfbee>] [<c01bfdc3>] [<c01138df>] [<c0110fb0>]
[<c0188011>] [<c018ec0d>] [<c0185d6e>] [<c018fbb5>] [<c018eb61>]
[<c0184d11>] [<c0184d83>] [<c018515f>] [<c01844ed>] [<c0188093>]
[<c01bfc2f>] [<c01bfc2f>] [<c01bfbee>] [<c01bfdc3>] [<c01138df>]
[<c0110fb0>] [<c0188011>] [<c018ec0d>] [<c01844ed>]
Code: 8b 13 85 d2 74 4e 89 d0 25 fb 0f 00 00 83 f8 63 75 31 81 e2
Using `boot/2.1.55/System.map' to map addresses to symbols.

>>EIP: c0143e2d <get_phys_addr+35/94>
Trace: c0143eba <get_array+2e/70>
Trace: c0143fcb <get_arg+63/6c>
Trace: c0144cc3 <get_process_array+6f/98>
Trace: c0144d8c <array_read+a0/264>
Trace: c0125871 <sys_read+b5/fc>
Trace: c0109e18 <system_call+48/50>
Trace: ffffffff
Trace: c0110fb0 <wake_up_interruptible+10/90>
Trace: c0188011 <poke_blanked_console+d/7c>
Trace: c018ec0d <set_cursor+11/a8>
Trace: c01844ed <vc_cons_allocated+d/28>
Trace: c0188093 <vt_console_print+13/1b4>
Trace: c01bfc2f <number+13/194>
Trace: c01bfc2f <number+13/194>
Trace: c01bfbee <skip_atoi+e/3c>
Trace: c01bfdc3 <vsprintf+13/400>
Trace: c01138df <printk+13/17c>
Trace: c0110fb0 <wake_up_interruptible+10/90>
Trace: c0188011 <poke_blanked_console+d/7c>
Trace: c018ec0d <set_cursor+11/a8>
Trace: c0185d6e <complement_pos+e/b0>
Trace: c018fbb5 <clear_selection+d/54>
Trace: c018eb61 <__set_origin+11/5c>
Trace: c0184d11 <set_origin+d/6c>
Trace: c0184d83 <scrup+13/280>
Trace: c018515f <lf+13/78>
Trace: c01844ed <vc_cons_allocated+d/28>
Trace: c0188093 <vt_console_print+13/1b4>
Trace: c01bfc2f <number+13/194>
Trace: c01bfc2f <number+13/194>
Trace: c01bfbee <skip_atoi+e/3c>
Trace: c01bfdc3 <vsprintf+13/400>
Trace: c01138df <printk+13/17c>
Trace: c0110fb0 <wake_up_interruptible+10/90>
Trace: c0188011 <poke_blanked_console+d/7c>
Trace: c018ec0d <set_cursor+11/a8>
Trace: c01844ed <vc_cons_allocated+d/28>

Code: c0143e2d <get_phys_addr+35/94>
Code: c0143e2d <get_phys_addr+35/94> 8b 13 movl (%ebx),%edx
Code: c0143e2f <get_phys_addr+37/94> 85 d2 testl %edx,%edx
Code: c0143e31 <get_phys_addr+39/94> 74 4e je c0143e81 <get_phys_addr+89/94>
Code: c0143e33 <get_phys_addr+3b/94> 89 d0 movl %edx,%eax
Code: c0143e35 <get_phys_addr+3d/94> 25 fb 0f 00 00 andl $0xffb,%eax
Code: c0143e40 <get_phys_addr+48/94> 83 f8 63 cmpl $0x63,%eax
Code: c0143e43 <get_phys_addr+4b/94> 75 31 jne c0143e70 <get_phys_addr+78/94>
Code: c0143e45 <get_phys_addr+4d/94> 81 e2 00 90 90 andl $0x90909000,%edx
Code: c0143e50 <get_phys_addr+58/94> 90

[grep '=[ym]' .config]

CONFIG_EXPERIMENTAL=y
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KERNELD=y
CONFIG_NET=y
CONFIG_PCI=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_OPTIMIZE=y
CONFIG_SYSVIPC=y
CONFIG_SYSCTL=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_BINFMT_JAVA=y
CONFIG_M586=y
CONFIG_M586_COPY=y
CONFIG_VIDEO_SELECT=y
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_MD=y
CONFIG_MD_LINEAR=y
CONFIG_MD_STRIPED=y
CONFIG_NETLINK=y
CONFIG_RTNETLINK=y
CONFIG_FIREWALL=y
CONFIG_NET_SECURITY=y
CONFIG_NET_ALIAS=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_FIREWALL=y
CONFIG_IP_FIREWALL_NETLINK=y
CONFIG_IP_FIREWALL_VERBOSE=y
CONFIG_IP_MASQUERADE=y
CONFIG_IP_TRANSPARENT_PROXY=y
CONFIG_IP_ALWAYS_DEFRAG=y
CONFIG_IP_ACCT=y
CONFIG_NET_IPIP=m
CONFIG_IP_MROUTE=y
CONFIG_IP_ALIAS=m
CONFIG_ARPD=y
CONFIG_SYN_COOKIES=y
CONFIG_INET_RARP=m
CONFIG_PATH_MTU_DISCOVERY=y
CONFIG_IP_NOSR=y
CONFIG_SKB_LARGE=y
CONFIG_IPV6=m
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
CONFIG_BLK_DEV_SR=y
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=y
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_TAGGED_QUEUEING=y
CONFIG_OVERRIDE_CMDS=y
CONFIG_AIC7XXX_PAGE_ENABLE=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_EQUALIZER=m
CONFIG_NET_ETHERNET=y
CONFIG_NET_ISA=y
CONFIG_NE2000=m
CONFIG_PPP=m
CONFIG_SLIP=m
CONFIG_SLIP_COMPRESSED=y
CONFIG_SLIP_SMART=y
CONFIG_SLIP_MODE_SLIP6=y
CONFIG_SHAPER=m
CONFIG_ISDN=m
CONFIG_ISDN_PPP=y
CONFIG_ISDN_PPP_VJ=y
CONFIG_ISDN_MPP=y
CONFIG_ISDN_AUDIO=y
CONFIG_ISDN_DRV_HISAX=m
CONFIG_HISAX_EURO=y
CONFIG_HISAX_16_3=y
CONFIG_MINIX_FS=m
CONFIG_EXT2_FS=y
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_PROC_FS=y
CONFIG_NFS_FS=m
CONFIG_NFSD=m
CONFIG_SUNRPC=m
CONFIG_LOCKD=m
CONFIG_ISO9660_FS=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_SERIAL=y
CONFIG_MOUSE=y
CONFIG_PSMOUSE=m
CONFIG_RTC=y
CONFIG_JOYSTICK=m
CONFIG_SOUND=y
CONFIG_SB=y
CONFIG_YM3812=y
CONFIG_PROFILE=y
CONFIG_KERNEL_DEBUGGING=y
CONFIG_DEBUG_KSTACK=y
CONFIG_DEBUG_SOFTLOCKUP=y
CONFIG_TRACER=y
CONFIG_MAGIC_SYSRQ=y

[egcs-970907+pgcc.. no ktrace]
Unable to handle kernel NULL pointer dereference at virtual address 00000bfc
current->tss.cr3 = 047ab000, `r3 = 047ab000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c013f3ad>]
EFLAGS: 00010246
eax: 00000000 ebx: 00000bfc ecx: bfffe835 edx: 000002ff
esi: 00000000 edi: 00000000 ebp: bfffe95d esp: c491befc
ds: 0018 es: 0018 ss: 0018
Process top (pid: 238, process nr: 12, stackpage=c491b000)
Sep 11 14:21:57 mikeg kernel: Stack: 00000000 bfffe835 c013f42f c43d0000 bfffe835 00000202 c01da970 00000000
000007ff c013f537 c43d0000 bfffe835 bfffe95d c1128000 00000540 c0140237
00000540 c1128000 c01402ff c1128000 00000540 0000000a c479e1e8 c27b7420
Call Trace: [<c013f42f>] [<c013f537>] [<c0140237>] [<c01402ff>] [<c01219a5>] [<c0121faf>] [<c010921a>]
Code: 8b 13 85 d2 75 05 31 c0 eb 53 90 89 d0 25 fb 0f 00 00 83 f8
Using `/boot/2.1.55/System.map' to map addresses to symbols.

>>EIP: c013f3ad <get_phys_addr+2d/90>
Trace: c013f42f <get_array+1f/5c>
Trace: c013f537 <get_arg+63/68>
Trace: c0140237 <get_process_array+3b/60>
Trace: c01402ff <array_read+a3/29c>
Trace: c01219a5 <do_open+49/12c>
Trace: c0121faf <sys_read+a7/f0>
Trace: c010921a <system_call+3a/40>

Code: c013f3ad <get_phys_addr+2d/90>
Code: c013f3ad <get_phys_addr+2d/90> 8b 13 movl (%ebx),%edx
Code: c013f3af <get_phys_addr+2f/90> 85 d2 testl %edx,%edx
Code: c013f3b1 <get_phys_addr+31/90> 75 05 jne c013f3b8 <get_phys_addr+38/90>
Code: c013f3b3 <get_phys_addr+33/90> 31 c0 xorl %eax,%eax
Code: c013f3b5 <get_phys_addr+35/90> eb 53 jmp c013f40a <get_phys_addr+8a/90>
Code: c013f3bd <get_phys_addr+3d/90> 90 nop
Code: c013f3be <get_phys_addr+3e/90> 89 d0 movl %edx,%eax
Code: c013f3c0 <get_phys_addr+40/90> 25 fb 0f 00 00 andl $0xffb,%eax
Code: c013f3c5 <get_phys_addr+45/90> 83 f8 00 cmpl $0x0,%eax
Code: c013f3ce <get_phys_addr+4e/90> 90 nop
Code: c013f3cf <get_phys_addr+4f/90> 90 nop
Code: c013f3d0 <get_phys_addr+50/90> 90 nop

Aside: Notice how little swap is being used here

3:16pm up 1:10h, 6 users, load average: 45.46, 38.74, 27.43
198 processes: 129 sleeping, 69 running, 0 zombie, 0 stopped
CPU states: 63.2% user, 18.7% system, 0.0% nice, 19.9% idle
Mem: 78580K av, 77440K used, 1140K free, 51412K shrd, 636K buff
Swap: 261496K av, 3232K used, 258264K free 5712K cached

USER PID %CPU %MEM NI VSZ RSS SHRD TT STAT TIME COMMAND
root 22115 3.7 2.4 0 2968 1924 980 1 R 0:00 /usr/lib/gcc-lib
root 22207 3.6 2.3 0 2924 1880 980 1 R 0:00 /usr/lib/gcc-lib
root 21870 3.5 3.2 0 3588 2568 1004 3 R 0:00 /usr/lib/gcc-lib
root 22200 3.2 2.4 0 2940 1896 980 1 R 0:00 /usr/lib/gcc-lib
root 21803 3.1 3.7 0 3976 2956 1004 3 R 0:00 /usr/lib/gcc-lib
root 21926 3.1 3.2 0 3588 2568 1004 3 R 0:00 /usr/lib/gcc-lib
root 21929 3.1 3.3 0 3672 2644 996 1 R 0:00 /usr/lib/gcc-lib
root 22060 3.1 2.8 0 3296 2264 992 1 R 0:00 /usr/lib/gcc-lib
root 22070 3.1 2.7 0 3216 2184 992 1 R 0:00 /usr/lib/gcc-lib
root 21748 3.0 3.6 0 3876 2856 1004 3 R 0:00 /usr/lib/gcc-lib
root 21775 3.0 3.6 0 3880 2864 1008 3 R 0:00 /usr/lib/gcc-lib
root 21993 3.0 2.8 0 3304 2272 992 1 R 0:00 /usr/lib/gcc-lib
root 1370 2.9 0.5 0 1520 452 228 4 R 4:28 top -d 1
root 14790 2.9 0.1 0 824 144 136 2 R 1:35 rsh -l root mike
root 21620 2.9 0.2 0 840 208 156 6 R 0:37 Bonnie
root 21677 2.9 4.1 0 4276 3252 1004 3 R 0:01 /usr/lib/gcc-lib
root 21832 2.9 3.6 0 3924 2904 1004 3 R 0:00 /usr/lib/gcc-lib
root 21864 2.9 3.2 0 3552 2528 1000 3 R 0:00 /usr/lib/gcc-lib