Re: [PATCH] net: datagram: Drain queue before reporting EOF or ENOTCONN

From: Petr Malat

Date: Thu May 14 2026 - 06:36:48 EST


On Tue, Apr 28, 2026 at 12:45:12PM -0700, Kuniyuki Iwashima wrote:
> On Mon, Apr 27, 2026 at 11:49???PM Petr Malat <oss@xxxxxxxxx> wrote:
> > On Tue, Apr 28, 2026 at 05:58:41AM +0000, Kuniyuki Iwashima wrote:
> > > From: Petr Malat <oss@xxxxxxxxx>
> > > Date: Mon, 27 Apr 2026 22:23:33 -0700
> > > > If a packet is queued and RCV_SHUTDOWN flag is set after the function
> > > > __skb_wait_for_more_packets() checked the queue, the function returns
> > > > EOF, which is then propagated by __unix_dgram_recvmsg() and the user
> > > > reads EOF although there is a message or messages still pending.
> > > >
> > > > The function should check if the queue is empty before returning EOF.
> > > > As the same is true for disconnect and it's also reasonable for a pending
> > > > signal, check in a common place before returning from the function.
> > > >
> > > > Signed-off-by: Petr Malat <oss@xxxxxxxxx>
> > > > ---
> > > > net/core/datagram.c | 38 +++++++++++++++++++++-----------------
> > > > 1 file changed, 21 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/net/core/datagram.c b/net/core/datagram.c
> > > > index c285c6465923..5952950f7233 100644
> > > > --- a/net/core/datagram.c
> > > > +++ b/net/core/datagram.c
> > > > @@ -98,40 +98,44 @@ int __skb_wait_for_more_packets(struct sock *sk,
> > > > struct sk_buff_head *queue,
> > > > /* Socket errors? */
> > > > error = sock_error(sk);
> > > > if (error)
> > > > - goto out_err;
> > > > + goto out;
> > > >
> > > > if (READ_ONCE(queue->prev) != skb)
> > > > goto out;
> > > >
> > > > /* Socket shut down? */
> > > > - if (sk->sk_shutdown & RCV_SHUTDOWN)
> > > > - goto out_noerr;
> > > > + if (sk->sk_shutdown & RCV_SHUTDOWN) {
> > > > + error = 1;
> > > > + goto check_queue;
> > >
> > > We already have checked the same condition just above, and this
> > > is a matter of timing.
> > Yes, but both a message followed by RCV_SHUTDOWN can arrive after the
> > condition has been checked and in which case returning the message
> > must be prefered.
> >
> >
> > > Even after the duplicated check is evaluated to false, there is
> > > a small chance that the concurrent sendmsg() enqueues a new skb.
> > No, there isn't, because the sendmsg checks the state and the sender
> > gets error in that case. This is done udner a lock, so the situation
> > you described is not possible, see unix_dgram_sendmsg():
>
> Note the function is not AF_UNIX specific and reused by
> a bunch of other protocols as skb_recv_datagram().

I'm well aware of it, they either suffer from the same bug or there
is no functional difference caused by this change.


> > unix_state_lock(other);
> > ....
> > if (other->sk_shutdown & RCV_SHUTDOWN) {
> > err = -EPIPE;
> > goto out_unlock;
> > }
> >
> > > Considering __skb_wait_for_more_packets() is called only when
> > > the queue is empty, it's not worth another round when shutdown()ed.
> > This breaks the interface as POSIX is quite clear here:
> > If no messages are available to be received and the peer has
> > performed an orderly shutdown, recv() shall return 0.
>
> Even if we think about only AF_UNIX, once the SEQPACKET peer
> close()s, it triggers sock_error(sk) even though skb was queued before
> __skb_wait_for_more_packets(). It seems "broken" in terms of POSIX,
> but rather the application is too fragile if it's broken by a sudden close()
> by peer. These are dgram-based sockets.
>
> Except for close() (so not limited to AF_UNIX), if you see the race,
> it means the application is calling shutdown() while recvmsg() is in
> progress, and such an application must be prepared for the race
> that it creates by itself, and that's how it has worked for decades.
>
>
> > In this case messages were available, but 0 was returned. This
> > behavior breaks even the simple example in unix(7), if you run it long
> > enough (https://man7.org/linux/man-pages/man7/unix.7.html).
>
> Example is just an example, even it doesn't use epoll.

So you say that behavior where the sender does
send(A)
send(B)
close()
and receiver gets
recv() -> A
recv() -> EOF
recv() -> B # Yes, it realy receives a message after EOF
is correct and meets the interface description in manual pages or
the Posix specification.
P.