Re: [PATCH net] tuntap: raise EPOLLOUT on device up

From: Michael S. Tsirkin
Date: Mon May 21 2018 - 22:50:44 EST


On Tue, May 22, 2018 at 11:22:11AM +0800, Jason Wang wrote:
>
>
> On 2018å05æ22æ 06:08, Michael S. Tsirkin wrote:
> > On Mon, May 21, 2018 at 11:47:42AM -0400, David Miller wrote:
> > > From: Jason Wang <jasowang@xxxxxxxxxx>
> > > Date: Fri, 18 May 2018 21:00:43 +0800
> > >
> > > > We return -EIO on device down but can not raise EPOLLOUT after it was
> > > > up. This may confuse user like vhost which expects tuntap to raise
> > > > EPOLLOUT to re-enable its TX routine after tuntap is down. This could
> > > > be easily reproduced by transmitting packets from VM while down and up
> > > > the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
> > > >
> > > > Cc: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx>
> > > > Cc: Eric Dumazet <edumazet@xxxxxxxxxx>
> > > > Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
> > > > Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx>
> > > I'm no so sure what to do with this patch.
> > >
> > > Like Michael says, this flag bit is only checks upon transmit which
> > > may or may not happen after this point. It doesn't seem to be
> > > guaranteed.
>
> The flag is checked in tun_chr_poll() as well.
>
> > Jason, can't we detect a link up transition and respond accordingly?
> > What do you think?
> >
>
> I think we've already tried to do this, in tun_net_open() we call
> write_space(). But the problem is the bit may not be set at that time.

Which bit? __dev_change_flags seems to set IFF_UP before calling
ndo_open. The issue I think is that tun_sock_write_space
exits if SOCKWQ_ASYNC_NOSPACE is clear.

And now I think I understand what is going on:

When link is down, writes to the device might fail with -EIO.
Userspace needs an indication when the status is resolved. As a fix,
tun_net_open attempts to wake up writers - but that is only effective if
SOCKWQ_ASYNC_NOSPACE has been set in the past. As a quick hack, set
SOCKWQ_ASYNC_NOSPACE when write fails because of the link down status.
If no writes failed, userspace does not know that interface
was down so should not care that it's going up.


does this describe what this line of code does?
If yes feel free to include this info in a code comment and commit log.



> A second thought is to set the bit in tun_chr_poll() instead of -EIO like:
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index d45ac37..46a1573 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1423,6 +1423,13 @@ static void tun_net_init(struct net_device *dev)
> ÂÂÂÂÂÂÂ dev->max_mtu = MAX_MTU - dev->hard_header_len;
> Â}
>
> +static bool tun_sock_writeable(struct tun_struct *tun, struct tun_file
> *tfile)
> +{
> +ÂÂÂÂÂÂ struct sock *sk = tfile->socket.sk;
> +
> +ÂÂÂÂÂÂ return (tun->dev->flags & IFF_UP) && sock_writeable(sk);
> +}
> +
> Â/* Character device part */
>
> Â/* Poll */
> @@ -1445,10 +1452,9 @@ static __poll_t tun_chr_poll(struct file *file,
> poll_table *wait)
> ÂÂÂÂÂÂÂ if (!ptr_ring_empty(&tfile->tx_ring))
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ mask |= EPOLLIN | EPOLLRDNORM;
>
> -ÂÂÂÂÂÂ if (tun->dev->flags & IFF_UP &&
> -ÂÂÂÂÂÂÂÂÂÂ (sock_writeable(sk) ||
> -ÂÂÂÂÂÂÂÂÂÂÂ (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags)
> &&
> -ÂÂÂÂÂÂÂÂÂÂÂÂ sock_writeable(sk))))
> +ÂÂÂÂÂÂ if (tun_sock_writeable(tun, tfile) ||
> +ÂÂÂÂÂÂÂÂÂÂ (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags)
> &&
> +ÂÂÂÂÂÂÂÂÂÂÂ tun_sock_writeable(tun, tfile)));
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ mask |= EPOLLOUT | EPOLLWRNORM;
>
> ÂÂÂÂÂÂÂ if (tun->dev->reg_state != NETREG_REGISTERED)
>
> Does this make more sense?
>
> Thanks