Re: [NFS] [PATCH] NFS: fix client hang due to race condition

From: Nick Wilson
Date: Thu Jul 07 2005 - 12:30:32 EST


On Wed, Jul 06, 2005 at 07:11:25PM -0700, Lever, Charles wrote:
> > The flags field in struct nfs_inode is protected by the BKL. The
> > following two code paths (there may be more, but my test program only
> > hits these two) modify the flags without obtaining the lock:
> >
> > nfs_end_data_update
> > nfs_release
> > nfs_file_release
> > __fput
> > fput
> > filp_close
> > sys_close
> > syscall_call
> >
> > nfs_revalidate_mapping
> > nfs_file_write
> > do_sync_write
> > vfs_write
> > sys_write
> > syscall_call
> >
> > Running multiple instances of a simple program [1] that opens, writes
> > to, and closes NFS mounted files eventually results in the programs
> > hanging on an SMP system (see kernel .config [3]).
> >
> > I've been testing this with 100 instances of the program:
> > $ ./breaknfs 100 &
> >
> > Usually within 10 minutes, all instances of breaknfs will hang. They
> > disappear from the output of 'top' and there is no NFS
> > activity between
> > the client and server.
>
> [ sysrq output snipped... ]
>
> > I've reproduced this bug on 2.6.11.10, 2.6.12-mm2, and 2.6.13-rc2.
> >
> > With my patch against 2.6.13-rc2 below, I ran 100 instances
> > of breaknfs
> > with this patch for 14 hours and I was unable to get the
> > client to hang.
>
> i agree this is a problem.
>
> but instead of using heavyweight synchronization, why not convert the
> NFS_INO flags into atomic bitops? i have a patch that does that; would
> need to be ported to the latest kernels and tested to see if it
> addresses the problem.
>
> nick, are you interested in trying it out?

Sure. Send it my way and I'll see if I can get it updated to the latest
kernels and test it out.

Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/