Re: [PATCH] net: datagram: Drain queue before reporting EOF or ENOTCONN

From: Petr Malat

Date: Tue Apr 28 2026 - 02:50:15 EST


On Tue, Apr 28, 2026 at 05:58:41AM +0000, Kuniyuki Iwashima wrote:
> From: Petr Malat <oss@xxxxxxxxx>
> Date: Mon, 27 Apr 2026 22:23:33 -0700
> > If a packet is queued and RCV_SHUTDOWN flag is set after the function
> > __skb_wait_for_more_packets() checked the queue, the function returns
> > EOF, which is then propagated by __unix_dgram_recvmsg() and the user
> > reads EOF although there is a message or messages still pending.
> >
> > The function should check if the queue is empty before returning EOF.
> > As the same is true for disconnect and it's also reasonable for a pending
> > signal, check in a common place before returning from the function.
> >
> > Signed-off-by: Petr Malat <oss@xxxxxxxxx>
> > ---
> > net/core/datagram.c | 38 +++++++++++++++++++++-----------------
> > 1 file changed, 21 insertions(+), 17 deletions(-)
> >
> > diff --git a/net/core/datagram.c b/net/core/datagram.c
> > index c285c6465923..5952950f7233 100644
> > --- a/net/core/datagram.c
> > +++ b/net/core/datagram.c
> > @@ -98,40 +98,44 @@ int __skb_wait_for_more_packets(struct sock *sk,
> > struct sk_buff_head *queue,
> > /* Socket errors? */
> > error = sock_error(sk);
> > if (error)
> > - goto out_err;
> > + goto out;
> >
> > if (READ_ONCE(queue->prev) != skb)
> > goto out;
> >
> > /* Socket shut down? */
> > - if (sk->sk_shutdown & RCV_SHUTDOWN)
> > - goto out_noerr;
> > + if (sk->sk_shutdown & RCV_SHUTDOWN) {
> > + error = 1;
> > + goto check_queue;
>
> We already have checked the same condition just above, and this
> is a matter of timing.
Yes, but both a message followed by RCV_SHUTDOWN can arrive after the
condition has been checked and in which case returning the message
must be prefered.


> Even after the duplicated check is evaluated to false, there is
> a small chance that the concurrent sendmsg() enqueues a new skb.
No, there isn't, because the sendmsg checks the state and the sender
gets error in that case. This is done udner a lock, so the situation
you described is not possible, see unix_dgram_sendmsg():
unix_state_lock(other);
....
if (other->sk_shutdown & RCV_SHUTDOWN) {
err = -EPIPE;
goto out_unlock;
}

> Considering __skb_wait_for_more_packets() is called only when
> the queue is empty, it's not worth another round when shutdown()ed.
This breaks the interface as POSIX is quite clear here:
If no messages are available to be received and the peer has
performed an orderly shutdown, recv() shall return 0.
In this case messages were available, but 0 was returned. This
behavior breaks even the simple example in unix(7), if you run it long
enough (https://man7.org/linux/man-pages/man7/unix.7.html).

> > + }
> >
> > /* Sequenced packets can come disconnected.
> > * If so we report the problem
> > */
> > - error = -ENOTCONN;
> > if (connection_based(sk) &&
> > - !(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_LISTEN))
> > - goto out_err;
> > + !(sk->sk_state == TCP_ESTABLISHED || sk->sk_state == TCP_LISTEN)) {
> > + error = -ENOTCONN;
>
> Also, the queue is always empty if SOCK_SEQPACKET sk is at TCP_CLOSE.
In my opinion this is not true if UDS reconnects, see the code
at unix_dgram_connect(), section "If it was connected, reconnect.",
where it sets TCP_CLOSE without checking the queue is empty. I haven't
checked other cases as it's enough to have one to justify the change.

Also, it makes the whole function simpler and one doesn't have to
review all callers or be afraid a future change could break it.

>
>
> > + goto check_queue;
> > + }
> >
> > /* handle signals */
> > - if (signal_pending(current))
> > - goto interrupted;
> > + if (signal_pending(current)) {
> > + error = sock_intr_errno(*timeo_p);
> > + goto check_queue;
>
> and we don't want to delay signal if it arrived first.
- We already do it anyway, if a signal followed by a message is
delivered before the initial queue check, the message handling gets
prefered.
- The signal is handled the same way, the only difference is the
user gets the message instead EINTR afterwards if he had a
nonrestartable handler set up. If the signal leads to the process,
termination, the syscal never returns. So, there is no delay.

Regards,
Petr