Re: [PATCH 2.6.25.7 v1-v2] af_unix: fix 'poll for write'/connectedDGRAM sockets
From: David Miller
Date: Fri Jun 27 2008 - 22:34:51 EST
From: Rainer Weikusat <rweikusat@xxxxxxxxxxx>
Date: Fri, 20 Jun 2008 15:35:25 +0200
> For n:1 'datagram connections' (eg /dev/log), the unix_dgram_sendmsg
> routine implements a form of receiver-imposed flow control by
> comparing the length of the receive queue of the 'peer socket' with
> the max_ack_backlog value stored in the corresponding sock structure,
> either blocking the thread which caused the send-routine to be called
> or returning EAGAIN. This routine is used by both SOCK_DGRAM and
> SOCK_SEQPACKET sockets. The poll-implementation for these socket types
> is datagram_poll from core/datagram.c. A socket is deemed to be
> writeable by this routine when the memory presently consumed by
> datagrams owned by it is less than the configured socket send buffer
> size. This is always wrong for PF_UNIX non-stream sockets connected to
> server sockets dealing with (potentially) multiple clients if the
> abovementioned receive queue is currently considered to be full.
> 'poll' will then return, indicating that the socket is writeable, but
> a subsequent write result in EAGAIN, effectively causing an (usual)
> application to 'poll for writeability by repeated send request with
> O_NONBLOCK set' until it has consumed its time quantum.
>
> The change below uses a suitably modified variant of the datagram_poll
> routines for both type of PF_UNIX sockets, which tests if the
> recv-queue of the peer a socket is connected to is presently
> considered to be 'full' as part of the 'is this socket
> writeable'-checking code. The socket being polled is additionally
> put onto the peer_wait wait queue associated with its peer, because the
> unix_dgram_recvmsg routine does a wake up on this queue after a
> datagram was received and the 'other wakeup call' is done implicitly
> as part of skb destruction, meaning, a process blocked in poll
> because of a full peer receive queue could otherwise sleep forever
> if no datagram owned by its socket was already sitting on this queue.
> Among this change is a small (inline) helper routine named
> 'unix_recvq_full', which consolidates the actual testing code (in three
> different places) into a single location.
>
> Signed-off-by: Rainer Weikusat <rweikusat@xxxxxxxxxxx>
Applied, thanks a lot Rainer.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/