RE: [PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets

From: Dexuan Cui
Date: Mon Jul 11 2016 - 10:35:12 EST


> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> ...
> Some comments below. The vast majority of them are really minor, the
> only thing which bothers me a little bit is WARN() in hvsock_sendmsg()
> which I think shouldn't be there. But I may have missed something.

Thank you for the very detailed comments, Vitaly!

Now I see I shouldn't put pr_err() in hvsock_sendmsg() and hvsock_recvmsg(),
because IMO a malicious app can use this to generate lots of messages to slow
down the system. I'll remove them.

I'll reply your other comments bellow.

> > +#define guid_t uuid_le
> > +struct sockaddr_hv {
> > + __kernel_sa_family_t shv_family; /* Address family */
> > + u16 reserved; /* Must be Zero */
> > + guid_t shv_vm_id; /* VM ID */
> > + guid_t shv_service_id; /* Service ID */
> > +};
>
> I'm not sure it is worth it to introduce a new 'guid_t' type here, we
> may want to rename
>
> shv_vm_id -> shv_vm_guid
> shv_service_id -> shv_service_guid
>
> and use uuid_le type.

Ok. I'll make the change.

> > +config HYPERV_SOCK
> > + tristate "Hyper-V Sockets"
> > + depends on HYPERV
> > + default m if HYPERV
> > + help
> > + Hyper-V Sockets is somewhat like TCP over VMBus, allowing
> > + communication between Linux guest and Hyper-V host without TCP/IP.
> > +
>
> I know it's hard to come up with a simple description but I'd rather
> describe is as "Socket interface for high speed communication between
> Linux guest and Hyper-V host over VMBus."

OK.

> > +static bool uuid_equals(uuid_le u1, uuid_le u2)
> > +{
> > + return !uuid_le_cmp(u1, u2);
> > +}
>
> why not use uuid_le_cmp directly?
OK. I will change to it.

> > +static unsigned int hvsock_poll(struct file *file, struct socket *sock,
> > + poll_table *wait)
>> ...
> > + if (channel) {
> > + /* If there is something in the queue then we can read */
> > + get_ringbuffer_rw_status(channel, &can_read, &can_write);
> > +
> > + if (!can_read && hvsk->recv)
> > + can_read = true;
> > +
> > + if (!(sk->sk_shutdown & RCV_SHUTDOWN) && can_read)
> > + mask |= POLLIN | POLLRDNORM;
> > + } else {
> > + can_read = false;
>
> we don't use can_read below

I'll remove the can_read assignment.

> > + channel = hvsk->channel;
> > + if (!channel) {
> > + WARN_ONCE(1, "NULL channel! There is a programming
> bug.\n");
>
> BUG() then

OK.

> > +static int hvsock_open_connection(struct vmbus_channel *channel)
> > +{
> > + ......
> > + if (conn_from_host) {
> > + if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog) {
> > + ret = -EMFILE;
>
> I'm not sure -EMFILE is appropriate, we don't really have "too many open
> files".
Here the ret value doesn't really matter, because the return value of the
function is not really used at present.

However, I agree with you that EMFILE is unsuitable.
Let me change to ECONNREFUSED, which seems better to me.

> > +static int hvsock_connect_wait(struct socket *sock,
> > + int flags, int current_ret)
> > +{
> > + struct sock *sk = sock->sk;
> > + struct hvsock_sock *hvsk;
> > + int ret = current_ret;
> > + DEFINE_WAIT(wait);
> > + long timeout;
> > +
> > + hvsk = sk_to_hvsock(sk);
> > + timeout = 30 * HZ;
>
> We may want to introduce a define for this timeout. Does it actually
> match host's timeout?

I'll add HVSOCK_CONNECT_TIMEOUT for this.
Yes, the value is from Hyper-V team.

> > +static int hvsock_accept_wait(struct sock *listener,
> > + ......
> > +
> > + if (ret) {
> > + release_sock(connected);
> > + sock_put(connected);
> > + } else {
> > + newsock->state = SS_CONNECTED;
> > + sock_graft(connected, newsock);
> > + release_sock(connected);
> > + sock_put(connected);
>
> so we do release_sock()/sock_put() unconditionally and this piece could
> be rewritten as
>
> if (!ret) {
> newsock->state = SS_CONNECTED;
> sock_graft(connected, newsock);
> }
> release_sock(connected);
> sock_put(connected);

Will do.


> > +static int hvsock_listen(struct socket *sock, int backlog)
> > +{
> > + ......
> > + /* This is an artificial limit */
> > + if (backlog > 128)
> > + backlog = 128;
>
> Let's do a define for it.
Ok.

> > +static int hvsock_sendmsg(struct socket *sock, struct msghdr *msg,
> > + size_t len)
> > +{
> > + struct hvsock_sock *hvsk;
> > + struct sock *sk;
> > + int ret;
> > +
> > + if (len == 0)
> > + return -EINVAL;
> > +
> > + if (msg->msg_flags & ~MSG_DONTWAIT) {
> > + pr_err("%s: unsupported flags=0x%x\n", __func__,
> > + msg->msg_flags);
>
> I don't think we need pr_err() here.

OK.

> > + /* ret is a bigger-than-0 total_written or a negative err code. */
> > + if (ret == 0) {
> > + WARN(1, "unexpected return value of 0\n");
> > + ret = -EIO;
> > + }
>
> It seems hvsock_sendmsg_wait() can return 0. I see the following code there:
>
> max_writable = get_ringbuffer_writable_bytes(channel);
> if (max_writable == 0)
> goto out_wait;

hvsock_sendmsg_wait() can not return 0: in the function, when we exit from
the level-2 "while (1)" loop by "break", normally can_write is true, meaning
get_ringbuffer_writable_bytes() returns at least 4096. Please see
get_ringbuffer_rw_status().

> so if there is no space on the ringbuffer we won't write
> anything. WARN() is inapropriate then.

I'll remove the default value of 0 for the 'ret ' in hvsock_sendmsg_wait() and
repalace the WARN() to BUG_ON(ret == 0) in hvsock_sendmsg().

> > +static int hvsock_recvmsg(struct socket *sock, struct msghdr *msg,
> > + size_t len, int flags)
> > ...
> > + /* We ignore msg->addr_name/len. */
> > + if (flags & ~MSG_DONTWAIT) {
> > + pr_err("%s: unsupported flags=0x%x\n", __func__, flags);
>
> Here he may also want to drop pr_err()

OK.

> > +static int hvsock_create_sock(struct net *net, struct socket *sock,
> > + int protocol, int kern)
> > +{
> > + struct sock *sk;
> > +
> > + if (!capable(CAP_SYS_ADMIN) && !capable(CAP_NET_ADMIN))
> > + return -EPERM;
>
> I'd say we're OK with CAP_SYS_ADMIN only for now and we'll be able to
> drop the check when host starts supporting single pair of ringbuffers
> for all Hyper-V sockets on the system.
I agree.

> > +static int __init hvsock_init(void)
> > +{
> > + int ret;
> > +
> > + /* Hyper-V Sockets requires at least VMBus 4.0 */
> > + if ((vmbus_proto_version >> 16) < 4)
> > + return -ENODEV;
>
> So it's actually
>
> if (vmbus_proto_version < VERSION_WIN10)
>
> I suggest we use such comparisson to be in line with other places where
> vmbus_proto_version is checked.
Sure. Thanks!

I'll post v16 shortly.

Thanks,
-- Dexuan