Re: Update of file offset on write() etc. is non-atomic with I/O

From: Al Viro
Date: Mon Mar 10 2014 - 11:55:46 EST


On Wed, Mar 05, 2014 at 12:04:11AM +0000, Al Viro wrote:
> There's also a pile of crap around sockfd_lookup/sockfd_put, related
> to that. Moreover, there's net/compat.c, which probably ought to
> have the compat syscalls themselves moved to net/socket.c (under
> ifdef CONFIG_COMPAT) and switched to sockfd_lookup_light().
> There's l2tp_tunnel_sock_lookup(), which is simply broken - it assumes
> that if tunnel->fd still resolves to a socket, that socket must
> be l2tp one. Trivial to drive into BUG_ON(), in queue_work() callback,
> no less... There's bluetooth, assuming that pretty much the same
> (that if it got a file descriptor that resolves to a socket, it must
> be a bluetooth one). BTW, I wonder what will happen if one gives
> iscsi_sw_tcp_conn_bind() descriptor of a socket of sufficiently
> weird sort...
>
> Then there's staging/usbip with its sockfd_to_socket(), which is more or
> less parallel to sockfd_lookup(). And open-coded analogs in nbd and
> ncpfs...

OK, I've gone through most of that; bluetooth is, indeed, oopsable (as simple
as e.g.
int sv[2];
int fd = socket(PF_BLUETOOTH, SOCK_RAW, BTPROTO_CMTP);
struct cmtp_connadd_req r = {};
socketpair(PF_LOCAL, SOCK_STREAM, 0, sv);
r.sock = sv[0];
ioctl(fd, CMTPCONNADD, (unsigned long)&r);
and similar with BNEP) and that one is easy to fix. l2tp I'd rather leave
for net folks to deal with - the problem there is that we stash sock *and*
descriptor number into struct l2tp_tunnel in l2tp_tunnel_create() and expect
l2tp_tunnel_sock_lookup() to find that descriptor (tunnel->fd) resolving
to nothing (if it got already closed) or to the same socket. Unfortunately,
the caller (l2tp_tunnel_del_work()) expects to find l2tp socket in the latter
case, so having it replaced with unrelated socket will do nasty things
to that caller. It looks rather silly, actually - the actual fuckup happens
when l2tp_tunnel_del_work() passes what has come from socket->sk to
l2tp_tunnel_sock_put(), which does
struct l2tp_tunnel *tunnel = l2tp_sock_to_tunnel(sk);
to find the tunnel its caller already had. Looks too convoluted for its
own good, and my first inclination would be to collapse l2tp_tunnel_sock_*
into the (only) caller, but I'm not sure if I'm not missing some subtle
race prevention in those back-and-forth lookups. In any case, it can
lead to l2tp_sock_to_tunnel() called on a sock that has nothing to do with
l2tp, so we do have a bug there.

I've attached bluetooth fixes; this stuff is obviously better off in one of
the net trees. Not sure if it's worth Cc:stable - up to Marcel and Davem.
These bugs are oopsable, but you need CAP_NET_ADMIN to step into those...

I think that what's in vfs.git#for-linus right now is OK to pull; it survives
all the beating I could think of. Linus, could you pull from the usual place?

git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

Shortlog:
Al Viro (3):
ocfs2 syncs the wrong range...
sockfd_lookup_light(): switch to fdget^W^Waway from fget_light
get rid of fget_light()

Linus Torvalds (1):
vfs: atomic f_pos accesses as per POSIX

Diffstat:
fs/file.c | 56 +++++++++++++++++++++++++++++++++++++++++++-------------
fs/file_table.c | 1 +
fs/namei.c | 2 +-
fs/ocfs2/file.c | 8 ++++----
fs/open.c | 4 ++++
fs/read_write.c | 40 ++++++++++++++++++++++++++--------------
include/linux/file.h | 27 +++++++++++++++------------
include/linux/fs.h | 8 ++++++--
net/socket.c | 13 +++++++------
9 files changed, 107 insertions(+), 52 deletions(-)