Re: [BUG] Regression on behavior of EPOLLET | EPOLLIN for AF_UNIXsockets in 3.2

From: Eric Dumazet
Date: Fri Jan 27 2012 - 14:45:02 EST


Le vendredi 27 janvier 2012 Ã 19:55 +0100, Eric Dumazet a Ãcrit :
> Le vendredi 27 janvier 2012 Ã 22:17 +0400, Glauber Costa a Ãcrit :
> > On 01/27/2012 09:53 PM, Eric Dumazet wrote:
> > > Le vendredi 27 janvier 2012 Ã 12:05 -0500, Nick Mathewson a Ãcrit :
> > >> [1.] One line summary of the problem:
> > >>
> > >
> > > Hi
> > >
> > > Probably coming from commit 0884d7aa24e15e72b3c07f7da910a13bb7df3592
> > > (AF_UNIX: Fix poll blocking problem when reading from a stream socket)
> > >
> > > When we requeue skb because not completely eaten, we call again
> > >
> > > sk->sk_data_ready(sk, skb->len);
> > >
> > For the record, I just confirmed this to be the case.
>
> A fix would be to change unix_poll() and not call sk_data_ready() when
> skb is requeued.
>
> if (!skb_queue_empty(&sk->sk_receive_queue))
> mask |= POLLIN | POLLRDNORM;
>
> Might be tricky if we want to keep unix_poll() lockless, but quite
> possible.
>
> Or... not dequeue skb from sk_received_queue unless fully consumed.
>

I am testing following patch :

net/unix/af_unix.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index aad8fb6..6eca195 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1918,7 +1918,7 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
struct sk_buff *skb;

unix_state_lock(sk);
- skb = skb_dequeue(&sk->sk_receive_queue);
+ skb = skb_peek(&sk->sk_receive_queue);
if (skb == NULL) {
unix_sk(sk)->recursion_level = 0;
if (copied >= target)
@@ -1959,8 +1959,6 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
/* Never glue messages from different writers */
if ((UNIXCB(skb).pid != siocb->scm->pid) ||
(UNIXCB(skb).cred != siocb->scm->cred)) {
- skb_queue_head(&sk->sk_receive_queue, skb);
- sk->sk_data_ready(sk, skb->len);
break;
}
} else {
@@ -1977,8 +1975,6 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,

chunk = min_t(unsigned int, skb->len, size);
if (memcpy_toiovec(msg->msg_iov, skb->data, chunk)) {
- skb_queue_head(&sk->sk_receive_queue, skb);
- sk->sk_data_ready(sk, skb->len);
if (copied == 0)
copied = -EFAULT;
break;
@@ -1993,13 +1989,10 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
if (UNIXCB(skb).fp)
unix_detach_fds(siocb->scm, skb);

- /* put the skb back if we didn't use it up.. */
- if (skb->len) {
- skb_queue_head(&sk->sk_receive_queue, skb);
- sk->sk_data_ready(sk, skb->len);
+ if (skb->len)
break;
- }
-
+
+ skb_unlink(skb, &sk->sk_receive_queue);
consume_skb(skb);

if (siocb->scm->fp)
@@ -2010,9 +2003,6 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
if (UNIXCB(skb).fp)
siocb->scm->fp = scm_fp_dup(UNIXCB(skb).fp);

- /* put message back and return */
- skb_queue_head(&sk->sk_receive_queue, skb);
- sk->sk_data_ready(sk, skb->len);
break;
}
} while (size);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/