Re: knfsd and ext2? Huh?

From: Alexei I. Adamovich (lexa@adam.botik.ru)
Date: Wed Jun 14 2000 - 14:22:56 EST


Trond Myklebust <trond.myklebust@fys.uio.no> (TM) wrote on 12 Jun 2000 18:18:34 +0200:
TM> >>>>> " " == Alexei I Adamovich <lexa@adam.botik.ru> writes:
TM>
TM> > I'm not shure if prune_dcache stuff is in 2.4.0-test1, but I
TM> > definitely saw some Neil's stuff in (BTW: I very like it, it
TM> > opens some way to filesystem-specific nfsd support).
TM>
TM> prune_dcache is in albeit in a slightly cleaner form (Al put a more
TM> generic solution into the VFS).
TM>
TM> What is not in 2.4.0-test1 is the inode i_count SMP race stuff. Please
TM> try out the ac-15 patch and see if that helps. Alternatively, please
TM> check if the following patch (which should also fix the inode race but
TM> only for the NFS client).

In short: nothing helps, neither ac-15, nor ac-16. The same with your
patch (tm?), applied to pure 2.4.0-test1 and to ac16 both. Also, -ac16
gives possibility to obatin some diagnostics, but 2.4.0-test1-tm seems
be more stable (less errors detected by Stress.sh).

"ac-16"-related (example of the error other than checksum error):
> W9 W15 R6 W8 W18 cp: preserving permissions for /testfs/stress/17/fb/00-INDEX: Input/output error

syslogd output (some part of it):
> ...
> Jun 13 20:28:28 adam kernel: expected (0x306/0x6d6ed), got (0x81a4/0x1)
> Jun 13 20:28:37 adam kernel: NFS: server cheating in read reply: count 196614 > recvd 9192
> Jun 13 20:28:37 adam kernel: nfs_refresh_inode: inode number mismatch
> Jun 13 20:28:37 adam kernel: expected (0x306/0x3f29), got (0x306/0x947a4)
> Jun 13 20:28:37 adam kernel: nfs_refresh_inode: inode number mismatch
> Jun 13 20:28:37 adam kernel: expected (0x306/0x947a4), got (0x306/0x3f29)
> Jun 13 20:43:31 adam kernel: NFS: server cheating in read reply: count 196614 > recvd 5096
> Jun 13 20:43:31 adam kernel: nfs_refresh_inode: inode number mismatch
> ...

Remind the scheme of the experiment:
  hda6 is ext2-mounted to /mnt;
  /mnt/pub is nfs-exported to 127.0.0.1;
  127.0.0.1:/mnt/pub is nfs-mounted to /testfs.

After terminating Stress.sh I was not able umount /testfs--umount was
suspended.
> adam:~ # fuser -v /testfs
> adam:~ # fuser -v /mnt
>
> USER PID ACCESS COMMAND
> /mnt root kernel mount /mnt

> adam:~ # ps
> PID TTY TIME CMD
> 152 ttyS0 00:00:00 gpm
> ...
> 3533 tty4 00:00:00 umount
> ...
> 3590 tty3 00:00:00 ps

> adam:~ # cat /proc/3533/stat
> 3533 (umount) D 274 3533 269 1028 3533 256 21 0 117 0 0 11 0 0 2 0 0 0
> 220270 1085440 121 2147483647 134512640 134538540 3221223600
> 3221223280 1074517997 524290 0 0 0 3223235028 0 0 17 1

> cat /proc/3533/status
> Name: umount
> State: D (disk sleep)
> Pid: 3533
> PPid: 274
> TracerPid: 0
> Uid: 0 0 0 0
> Gid: 0 0 0 0
> FDSize: 1024
> Groups: 0 1 14 15 16 17 65534
> VmSize: 1060 kB
> VmLck: 0 kB
> VmRSS: 484 kB
> VmData: 48 kB
> VmStk: 8 kB
> VmExe: 28 kB
> VmLib: 948 kB
> SigPnd: 0000000000080002
> SigBlk: 0000000000000000
> SigIgn: 8000000000000000
> SigCgt: 0000000000000000
> CapInh: 0000000000000000
> CapPrm: 00000000fffffeff
> CapEff: 00000000fffffeff

> adam:~ # mount -o remount,ro /testfs/
> mount: /testfs not mounted already, or bad option

> adam:~ # mount -f -o remount,ro /testfs/

and later, trying to shutdown:

> Shutting down kernel based NFS server/sbin/init.d/rc2.d/K23nfsserver:
> line 111: 3662 Segmentation fault /usr/sbin/kexportfs -a

> Unable to handle kernel paging request at virtual address 5a5a5a82
> printing eip:
> c016daa0
> *pde = 00000000
> Oops: 0000
> CPU: 0
> EIP: 0010:[exp_do_unexport+84/180]
> EFLAGS: 00010212
> eax: 00000010 ebx: c13ea400 ecx: 00000010 edx: 5a5a5a5a
> esi: c5e06800 edi: c6c8dd20 ebp: c79dc000 esp: c7523f3c
> ds: 0018 es: 0018 ss: 0018
> Process kexportfs (pid: 3662, stackpage=c7523000)
> Stack: 00000306 ffffffea c79dc004 c016dbdf c5e06800 ffffffea c7522000 00000814
> c01684d9 c79dc004 c7522000 00000000 00000000 bfffec6c 00000000 00000000
> 00000000 c7523fa4 00000000 c014026f c463e260 c7523fa4 c013ce56 c7523fa4
> Call Trace: [exp_unexport+131/148] [sys_nfsservctl+381/900] [path_release+47/156]
> [sys_newstat+134/208] [sys_close+12/16] [system_call+52/56]
> Code: 0f b7 42 28 66 39 86 1c 04 00 00 75 0b 8b 42 20 39 86 20 04

2.4.0-test1-tm was exposing less errors with Stress.sh, but also had them.
With -ac16-tm I had the same picture as with "pure" -ac16, described above.

Have no any idea on the sources of the errors.

TM> There are also a couple of other minor SMP race fixes that should go
TM> in to the final 2.4.0. I haven't sent them to Alan, but perhaps I
TM> should...

Should we try them?

Regards,

   Alexei I.Adamovich
----------------------------------------------------------------
Res. Centre for Multiprocessor Systems, | e-mail:
PSI RAS, Pereslavl-Zalessky 152140 Russia | lexa@adam.botik.ru
----------------------------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jun 15 2000 - 21:00:32 EST