Re: NFS file locking broken (take 2)

Steven N. Hirsch (shirsch@adelphia.net)
Sun, 23 Aug 1998 15:42:13 -0400 (EDT)


This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
Send mail to mime@docserver.cac.washington.edu for more info.

---1463799052-823483528-903901333=:328
Content-Type: TEXT/PLAIN; charset=US-ASCII

Alan asked:

> Let me check one thing here. Are you using lockf/posix locks or flock.
> flock is generally host local.

I went back to square one and rebuilt kernels on both machines using
2.1.117 with ac1 + hj's patches from the 8/20 knfsd utils package.

I used the attached C program to ensure that fcntl() semantics were being
applied. After verifying that the test script worked properly on local
storage, I tried running it on an NFS mounted directory.

Here's what transpires (100% repeatable):

--------------------------------------------------------------------------------

On client:

$ ./runtest.sh

232: Have write lock, appending to file
232: Releasing lock..
232: Have write lock, appending to file
232: Releasing lock..
232: Have write lock, appending to file
230: Can't get write lock, I'll sleep on it..
232: Releasing lock..

(and.. dead)

The test file contents:

232: Sun Aug 23 15:20:41 1998
232: Sun Aug 23 15:20:47 1998
232: Sun Aug 23 15:20:53 1998

ps on client shows:

FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
100 0 1 0 0 0 820 332 pause S ? 0:08 init [2]
40 0 2 1 0 0 0 0 bdflush SW ? 0:00 (kflushd)
840 0 3 1 0 0 0 0 kswapd SW ? 0:00 (kswapd)
140 0 43 1 0 0 800 260 sigsuspend S ? 0:00 /sbin/upd

140 0 114 1 0 0 844 408 do_select S ? 0:00 rpc.kstat
40 0 126 1 0 0 948 512 do_select S ? 0:00 rpc.kmoun
140 0 135 1 0 0 0 0 svc_recv SW ? 0:00 (nfsd)
40 0 136 1 0 0 0 0 svc_recv SW ? 0:00 (nfsd)
40 0 159 1 0 0 856 404 pipe_read S ? 0:00 automount

40 0 137 135 0 0 0 0 svc_recv SW ? 0:00 (lockd)
40 0 138 137 0 0 0 0 rpciod SW ? 0:00 (rpciod)

0 600 229 217 0 0 1144 576 wait4 S ? 0:00 sh ./runt
100 600 230 229 0 0 804 240 rpc_execute D ? 0:00 ./locker
0 600 232 229 0 0 808 288 rpc_execute D ? 0:00 ./locker
0 600 239 238 15 0 1164 700 wait4 S ? 0:00 -bash

The client log starts filling with:

Aug 23 15:22:38 cy kernel: lockd: server 192.168.244.50 not responding, still trying
Aug 23 15:22:39 cy kernel: lockd: server 192.168.244.50 not responding, still trying

Issuing a kill -9 to both locker instances eventually succeeds in killing them.

---------------------------------------------------------------------------------------

On server:

svc: unknown procedure (15)
kfree: Bad obj c0211925
Unable to handle kernel NULL pointer dereference at virtual address 00000000
current->tss.cr3 = 00101000, <r3 = 00101000
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<c0121a4b>]
EFLAGS: 00013286
eax: 0000001b ebx: c0266330 ecx: c0b82000 edx: c0213f54
esi: c0211925 edi: c0988b80 ebp: c0988330 esp: c0b09ec0
ds: 0018 es: 0018 ss: 0018
Process lockd (pid: 131, process nr: 22, stackpage=c0b09000)
Stack: c0931000 c0931040 c0988b80 c0988330 c2686a90 01173bb7 c014d00d c0211925
c014eba7 c0931008 00000000 00000000 c0931000 c014ed35 c0931000 00000000
c0988300 c0988410 c0085000 c0217e30 00000017 c014f804 c0085000 c0b09f34
Call Trace: [<c014d00d>] [<c014eba7>] [<c014ed35>] [<c014f804>] [<c014f9d6>] [<c014f940>] [<c0184e88>]
[<c0186a2e>] [<c014e5bd>] [<c0184a8f>] [<c014e3a0>]
Code: c7 05 00 00 00 00 00 00 00 00 83 c4 08 5b 5e 5f 5d 83 c4 08

Using `/net/cy/lib/modules/2.1.117/System.map' to map addresses to symbols.

>>EIP: c0121a4b <free_page_and_swap_cache+57/58>
Trace: c014d00d <autofs_hash_lookup+35/68>
Trace: c014eba7 <devpts_root_lookup+4f/ac>
Trace: c014ed35 <devpts_parse_options+b9/2a8>
Trace: c014f804 <sock_read+7c/98>
Trace: c014f9d6 <sock_fasync+a/c8>
Trace: c014f940 <sock_readv_writev+88/90>
Trace: c0184e88 <do_probe+4/164>
Trace: c0186a2e <proc_ide_read_identify+52/98>
Trace: c014e5bd <autofs_follow_link+1/24>
Trace: c0184a8f <try_to_identify+4b/440>
Trace: c014e3a0 <autofs_root_ioctl+bc/2a4>
Code: c0121a4b <free_page_and_swap_cache+57/58> movl $0x0,0x0
Code: c0121a55 <lookup_swap_cache+9/ac> addl $0x8,%esp
Code: c0121a58 <lookup_swap_cache+c/ac> popl %ebx
Code: c0121a59 <lookup_swap_cache+d/ac> popl %esi
Code: c0121a5a <lookup_swap_cache+e/ac> popl %edi
Code: c0121a5b <lookup_swap_cache+f/ac> popl %ebp
Code: c0121a5c <lookup_swap_cache+10/ac> addl $0x8,%esp
Code: c0121a5f <lookup_swap_cache+13/ac>

ps on server:

FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND

40 0 2 1 0 0 0 0 bdflush SW ? 0:00 (kflushd)
840 0 3 1 0 0 0 0 kswapd SW ? 0:00 (kswapd)
140 0 46 1 0 0 800 260 sigsuspend S ? 0:00 /sbin/upd

140 0 107 1 0 0 844 416 do_select S ? 0:00 rpc.kstat
140 0 119 1 0 0 920 516 do_select S ? 0:00 rpc.kmoun
140 0 128 1 0 0 0 0 svc_recv SW ? 0:01 (nfsd)
40 0 129 1 0 0 0 0 svc_recv SW ? 0:01 (nfsd)
40 0 132 1 0 0 0 0 rpciod SW ? 0:00 (rpciod)
40 0 140 1 0 0 856 404 pipe_read S ? 0:00 automount

144 0 131 128 0 0 0 0 do_exit Z ? 0:00 (lockd <z(ombie)>)

Server requires a reboot to bring things back to normal.

Hope this report is detailed enough to be of assistance.

Steve

---1463799052-823483528-903901333=:328
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="runtest.sh"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.3.96.980823154213.328B@air.steve.net>
Content-Description:

IyEvYmluL3NoDQoNCj4gdGVzdC5maWxlDQoNCi4vbG9ja2VyICYNCnNsZWVw
IDENCmVjaG8gIiINCi4vbG9ja2VyDQo=
---1463799052-823483528-903901333=:328
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="locker.c"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.3.96.980823154213.328C@air.steve.net>
Content-Description:

I2luY2x1ZGUgPHN0ZGxpYi5oPg0KI2luY2x1ZGUgPHN0ZGlvLmg+DQojaW5j
bHVkZSA8YXNzZXJ0Lmg+DQoNCiNpbmNsdWRlIDxzeXMvdHlwZXMuaD4NCiNp
bmNsdWRlIDxzeXMvc3RhdC5oPg0KI2luY2x1ZGUgPGZjbnRsLmg+DQojaW5j
bHVkZSA8dGltZS5oPg0KDQovKiBGcm9tIFN0ZXZlbnMgQVBVRSBib29rLiAq
Lw0KaW50DQpsb2NrX3JlZyhpbnQgZmQsIGludCBjbWQsIGludCB0eXBlLCBv
ZmZfdCBvZmZzZXQsIGludCB3aGVuY2UsIG9mZl90IGxlbikNCnsNCglzdHJ1
Y3QgZmxvY2sJbG9jazsNCg0KCWxvY2subF90eXBlID0gdHlwZTsJCS8qIEZf
UkRMQ0ssIEZfV1JMQ0ssIEZfVU5MQ0sgKi8NCglsb2NrLmxfc3RhcnQgPSBv
ZmZzZXQ7CS8qIGJ5dGUgb2Zmc2V0LCByZWxhdGl2ZSB0byBsX3doZW5jZSAq
Lw0KCWxvY2subF93aGVuY2UgPSB3aGVuY2U7CS8qIFNFRUtfU0VULCBTRUVL
X0NVUiwgU0VFS19FTkQgKi8NCglsb2NrLmxfbGVuID0gbGVuOwkJLyogI2J5
dGVzICgwIG1lYW5zIHRvIEVPRikgKi8NCg0KCXJldHVybiggZmNudGwoZmQs
IGNtZCwgJmxvY2spICk7DQp9DQoNCg0KaW50DQptYWluKCB2b2lkICkNCnsN
CiAgY2hhciBidWZbMTI4XTsNCiAgdGltZV90IHQ7DQogIGludCBmZDsNCiAg
cGlkX3QgbWU7DQogIGludCB0cmllcyA9IDEwOw0KICANCiAgbWUgPSBnZXRw
aWQoKTsNCiAgDQogIGZkID0gb3BlbiggInRlc3QuZmlsZSIsIE9fQVBQRU5E
fE9fUkRXUiwgU19JUlVTUnxTX0lXVVNSICk7DQogIGFzc2VydChmZCAhPSAt
MSk7DQogICAgDQogIHdoaWxlICh0cmllcykgew0KICAgIGlmICggbG9ja19y
ZWcoZmQsIEZfU0VUTEssIEZfV1JMQ0ssIDAsIFNFRUtfU0VULCAwICkgPT0g
LTEgKSB7DQogICAgICBwcmludGYoIiVkOiBDYW4ndCBnZXQgd3JpdGUgbG9j
aywgSSdsbCBzbGVlcCBvbiBpdC4uXG4iLCBtZSk7DQogICAgICBpZiAoIGxv
Y2tfcmVnKGZkLCBGX1NFVExLVywgRl9XUkxDSywgMCwgU0VFS19TRVQsIDAg
PT0gLTEgKSApIHsNCglwZXJyb3IoImxvY2t0ZXN0Iik7DQoJZXhpdCgxKTsN
CiAgICAgIH0NCiAgICB9DQogICAgDQogICAgcHJpbnRmKCIlZDogSGF2ZSB3
cml0ZSBsb2NrLCBhcHBlbmRpbmcgdG8gZmlsZVxuIiwgbWUpOw0KICAgIA0K
ICAgIHNwcmludGYoYnVmLCAiJWQ6ICIsIG1lKTsNCiAgICB0aW1lKCZ0KTsN
CiAgICBzdHJjYXQoYnVmLCBjdGltZSgmdCkpOw0KICAgIHdyaXRlKGZkLCBi
dWYsIHN0cmxlbihidWYpKTsNCiAgICANCiAgICBzbGVlcCg1KTsNCiAgICAN
CiAgICBwcmludGYoIiVkOiBSZWxlYXNpbmcgbG9jay4uXG4iLCBtZSk7DQog
ICAgDQogICAgaWYgKCBsb2NrX3JlZyhmZCwgRl9TRVRMSywgRl9VTkxDSywg
MCwgU0VFS19TRVQsIDAgKSA9PSAtMSApIHsNCiAgICAgIHBlcnJvcigibG9j
a3Rlc3QiKTsNCiAgICAgIGV4aXQoMSk7DQogICAgfQ0KICAgIHNsZWVwKDEp
Ow0KICAgIHRyaWVzLS07DQogIH0NCiAgDQogIGNsb3NlKGZkKTsNCiAgcmV0
dXJuIDA7DQogIA0KfQ0KDQo=
---1463799052-823483528-903901333=:328--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html