Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter

From: Al Viro
Date: Wed Nov 05 2014 - 22:25:46 EST


On Wed, Nov 05, 2014 at 04:57:19PM -0500, David Miller wrote:
> From: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> Date: Wed, 5 Nov 2014 21:07:45 +0000
>
> > Ping me when you put it there, OK? I'll rebase the rest of old stuff on
> > top of it (similar helpers, mostly).
>
> I just pushed it into net-next, thanks Al.

OK, I've taken the beginning of the old queue on top of net-next; it's
in git://git.kernel.org//pub/scm/linux/kernel/git/viro/vfs.git iov_iter-net.

>From the quick look at the remaining ->msg_iov users:

* I'll need to add several iov_iter primitives - counterparts of
checksum.h stuff (copy_and_csum_{from,to}_iter(), maybe some more). Not
a big deal, I'll do that tomorrow. That will give us a clean iov_iter-based
counterpart of skb_copy_and_csum_datagram_iovec().

* a new helper: zerocopy_sg_from_iter(). I have it, actually,
but I'd rather not step on Herbert's toes - it's too close to the areas
his series will touch, so that's probably for when his series goes in.
It will be needed for complete macvtap conversion...

* why doesn't verify_iovec() use rw_copy_check_uvector()? The only
real differences I see is that (a) you do allocation in callers (same as
rw_copy_check_uvector() would've done), (b) you return EMSGSIZE in case of
too long vector, while rw_copy_check_uvector() returns EINVAL in that case
and (c) you don't do access_ok(). The last one is described as optimization,
but for iov_iter primitives it's a serious PITA - for iovec-backed instances
they are using __copy_from_user()/__copy_to_user(), etc.
It certainly would be nice to have the same code doing all copying
of iovecs from userland - readv/writev/aio/sendmsg/recvmsg/etc. Am I
missing something subtle semantical difference in there? EMSGSIZE vs EINVAL
is trivial (we can lift that check into the callers, if nothing else), but
I could miss something more interesting...

* various getfrag will need to grow iov_iter-based counterparts,
but ip_append_output() needs no changes, AFAICS.

* crypto stuff will be easy to convert - iov_iter_get_pages()
would suffice for a primitive

* there's some really weird stuff in there. Just what is this
static int raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg)
{
struct iovec *iov;
u8 __user *type = NULL;
u8 __user *code = NULL;
int probed = 0;
unsigned int i;

if (!msg->msg_iov)
return 0;

for (i = 0; i < msg->msg_iovlen; i++) {
iov = &msg->msg_iov[i];
if (!iov)
continue;
trying to do? "If non-NULL pointer + i somehow happened to be NULL, skip it
and try to use the same pointer + i + 1"? Huh? Had been that way since
the function first went in back in 2004 ("[IPV4] XFRM: probe icmp type/code
when sending packets via raw socket.", according to historical tree)...

* rds, bluetooth and vsock are doing something odd; need to RTFS some
more.

* not sure I understand what TIPC is doing - does it prohibit too
short first segment of ->msg_iov? net/tipc/socket.c:dest_name_check() looks
odd _and_ potentially racy - we read the same data twice and hope our checks
still apply. I asked TIPC folks about that race back in April, but it
looks like that fell through the cracks...

Overall, so far it looks more or less feasible - other than the missing csum
primitives, current mm/iov_iter.c should suffice. I have _not_ seriously
looked into sendpage yet; that might very well require some more.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/