[GIT PULL] Please pull NFS client bugfixes

From: Trond Myklebust
Date: Mon Mar 14 2011 - 14:09:53 EST


Hi Linus,

Please pull from the "bugfixes" branch of the repository at

git pull git://git.linux-nfs.org/projects/trondmy/nfs-2.6.git bugfixes

This will update the following files through the appended changesets.

Cheers,
Trond

----
fs/nfs/inode.c | 7 ++-
fs/nfs/nfs4_fs.h | 10 ++-
fs/nfs/nfs4filelayoutdev.c | 4 +
fs/nfs/nfs4proc.c | 91 +++++++++++++++++-------------
fs/nfs/nfs4state.c | 29 +++++++---
fs/nfs/nfs4xdr.c | 4 +-
fs/nfs/nfsroot.c | 29 +++++-----
fs/nfs/unlink.c | 2 +-
fs/nfs/write.c | 2 +
include/linux/nfs_fs_sb.h | 10 +--
include/linux/sunrpc/sched.h | 1 +
kernel/sched.c | 1 +
net/sunrpc/sched.c | 75 ++++++++++++++++++++-----
net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
net/sunrpc/xprtsock.c | 3 +-
15 files changed, 178 insertions(+), 91 deletions(-)

commit 53d4737580535e073963b91ce87d4216e434fab5
Author: Chuck Lever <chuck.lever@xxxxxxxxxx>
Date: Fri Mar 11 15:31:06 2011 -0500

NFS: NFSROOT should default to "proto=udp"

There have been a number of recent reports that NFSROOT is no longer
working with default mount options, but fails only with certain NICs.

Brian Downing <bdowning@xxxxxxxxx> bisected to commit 56463e50 "NFS:
Use super.c for NFSROOT mount option parsing". Among other things,
this commit changes the default mount options for NFSROOT to use TCP
instead of UDP as the underlying transport.

TCP seems less able to deal with NICs that are slow to initialize.
The system logs that have accompanied reports of problems all show
that NFSROOT attempts to establish a TCP connection before the NIC is
fully initialized, and thus the TCP connection attempt fails.

When a TCP connection attempt fails during a mount operation, the
NFS stack needs to fail the operation. Usually user space knows how
and when to retry it. The network layer does not report a distinct
error code for this particular failure mode. Thus, there isn't a
clean way for the RPC client to see that it needs to retry in this
case, but not in others.

Because NFSROOT is used in some environments where it is not possible
to update the kernel command line to specify "udp", the proper thing
to do is change NFSROOT to use UDP by default, as it did before commit
56463e50.

To make it easier to see how to change default mount options for
NFSROOT and to distinguish default settings from mandatory settings,
I've adjusted a couple of areas to document the specifics.

root_nfs_cat() is also modified to deal with commas properly when
concatenating strings containing mount option lists. This keeps
root_nfs_cat() call sites simpler, now that we may be concatenating
multiple mount option strings.

Tested-by: Brian Downing <bdowning@xxxxxxxxx>
Tested-by: Mark Brown <broonie@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
Cc: <stable@xxxxxxxxxx> # 2.6.37
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit 57df216bd8c8813a79a6a618e3d2ec937d532b86
Author: Huang Weiyi <weiyi.huang@xxxxxxxxx>
Date: Tue Mar 8 23:11:30 2011 +0000

nfs4: remove duplicated #include

Remove duplicated #include('s) in
fs/nfs/nfs4proc.c

Signed-off-by: Huang Weiyi <weiyi.huang@xxxxxxxxx>
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit f9feab1e180d1392f2f59d692826c6da2e57adf4
Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Date: Wed Mar 9 16:12:46 2011 -0500

NFSv4: nfs4_state_mark_reclaim_nograce() should be static

There are no more external users of nfs4_state_mark_reclaim_nograce() or
nfs4_state_mark_reclaim_reboot(), so mark them as static.

Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit ecac799a5ecc364006f0db6f2db15e77ed4d63e2
Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Date: Wed Mar 9 16:00:56 2011 -0500

NFSv4: Fix the setlk error handler

Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit b4410c2f7f775b03da31566c05bb8d2383c7dc27
Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Date: Wed Mar 9 16:00:55 2011 -0500

NFSv4.1: Fix the handling of the SEQUENCE status bits

We want SEQUENCE status bits to be handled by the state manager in order
to avoid threading issues.

Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit 0400a6b0cb756f976bae32ae8db47bfa9853897c
Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Date: Wed Mar 9 16:00:53 2011 -0500

NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses

nfs4_schedule_state_recovery() should only be used when we need to force
the state manager to check the lease. If we just want to start the
state manager in order to handle a state recovery situation, we should be
using nfs4_schedule_state_manager().

This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing
its use with a set of helper functions that do the right thing.

Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit c34c32ea97718bb24fc06158733580003ba89211
Author: Andy Adamson <andros@xxxxxxxxxx>
Date: Wed Mar 9 13:13:46 2011 -0500

NFSv4.1 reclaim complete must wait for completion

Signed-off-by: Andy Adamson <andros@xxxxxxxxxx>
[Trond: fix whitespace errors]
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit 114f64b5f24abac33a42f4f1856eb3a9766d497e
Author: Andy Adamson <andros@xxxxxxxxxx>
Date: Wed Mar 9 13:13:45 2011 -0500

NFSv4: remove duplicate clientid in struct nfs_client

Signed-off-by: Andy Adamson <andros@xxxxxxxxxx>
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit 7d6d63d6427090cbb1d282364b65b12634ca59bd
Author: Ricardo Labiaga <Ricardo.Labiaga@xxxxxxxxxx>
Date: Wed Mar 9 13:13:44 2011 -0500

NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY

Fix bug where we currently retry the EXCHANGEID call again, eventhough
we already have a valid clientid. Instead, delay and retry the CREATE_SESSION
call.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@xxxxxxxxxx>
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit 4cea288aaf0e11647880cc487350b1dc45d9febc
Author: Ben Hutchings <bhutchings@xxxxxxxxxxxxxx>
Date: Tue Feb 22 21:54:34 2011 +0000

sunrpc: Propagate errors from xs_bind() through xs_create_sock()

xs_create_sock() is supposed to return a pointer or an ERR_PTR-encoded
error, but it currently returns 0 if xs_bind() fails.

Signed-off-by: Ben Hutchings <bhutchings@xxxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxx [v2.6.37]
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit 3fa0b4e201d254b52a251fa348bd53e53000cff6
Author: Frank Filz <ffilzlnx@xxxxxxxxxx>
Date: Thu Dec 2 19:31:23 2010 +0000

(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid

The problem was use of an int32, which when converted to a uint64
is sign extended resulting in a fileid that doesn't fit in 32 bits
even though the intent of the function is to fit the fileid into
32 bits.

Signed-off-by: Frank Filz <ffilzlnx@xxxxxxxxxx>
Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
[Trond: Added an include for compat.h]
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit 43b7c3f051dea504afccc39bcb56d8e26c2e0b77
Author: Jovi Zhang <bookjovi@xxxxxxxxx>
Date: Wed Mar 2 23:19:37 2011 +0000

nfs: fix compilation warning

this commit fix compilation warning as following:
linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast

Signed-off-by: Jovi Zhang <bookjovi@xxxxxxxxx>
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit b9f810570d9cc13177128e11a74e22d37aa68a1a
Author: Stanislav Fomichev <kernel@xxxxxxxxxxx>
Date: Sat Feb 5 23:13:01 2011 +0000

nfs: add kmalloc return value check in decode_and_add_ds

add kmalloc return value check in decode_and_add_ds

Signed-off-by: Stanislav Fomichev <kernel@xxxxxxxxxxx>
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit a5e502681007779d4762fb3ef7e80a3ecd1cfe6b
Author: Jesper Juhl <jj@xxxxxxxxxxxxx>
Date: Sat Jan 22 21:40:20 2011 +0000

SUNRPC: Remove resource leak in svc_rdma_send_error()

We leak the memory allocated to 'ctxt' when we return after
'ib_dma_mapping_error()' returns !=0.

Signed-off-by: Jesper Juhl <jj@xxxxxxxxxxxxx>
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit d2224e7afbf2a6556f4f8f25bc0e96d99ec4d2bd
Author: Jeff Layton <jlayton@xxxxxxxxxx>
Date: Sun Mar 6 17:14:13 2011 +0000

nfs: close NFSv4 COMMIT vs. CLOSE race

I've been adding in more artificial delays in the NFSv4 commit and close
codepaths to uncover races. The kernel I'm testing has the patch to
close the race in __rpc_wait_for_completion_task that's in Trond's
cthon2011 branch. The reproducer I've been using does this in a loop:

mkdir("DIR");
fd = open("DIR/FILE", O_WRONLY|O_CREAT|O_EXCL, 0644);
write(fd, "abcdefg", 7);
close(fd);
unlink("DIR/FILE");
rmdir("DIR");

The above reproducer shouldn't result in any silly-renaming. However,
when I add a "msleep(100)" just after the nfs_commit_clear_lock call in
nfs_commit_release, I can almost always force one to occur. If I can
force it to occur with that, then it can happen without that delay
given the right timing.

nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called
with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait
for the task to complete before putting its reference to it, so the last
reference get put in rpc_release task and gets queued to a workqueue.

In this situation, the last open context reference may be put by the
COMMIT release instead of the close() syscall. The close() syscall
returns too quickly and the unlink runs while the d_count is still
high since the COMMIT release hasn't put its dentry reference yet.

Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete
before putting the task reference when FLUSH_SYNC is set. With this, the
last reference is put by the process that's initiating the FLUSH_SYNC
commit and the race is closed.

Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>

commit bf294b41cefcb22fc3139e0f42c5b3f06728bd5e
Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Date: Mon Feb 21 11:05:41 2011 -0800

SUNRPC: Close a race in __rpc_wait_for_completion_task()

Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops->rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().

Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>


--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/