RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

From: David Laight
Date: Thu May 29 2014 - 06:54:33 EST


From: 'Arnaldo Carvalho de
...
> > > So, yes, the user _can_ process the packets already copied to userspace,
> > > i.e. no packet loss, and then, on the next call, will receive the signal
> > > notification.
>
> > The application shouldn't need to see an EINTR response, any signal handler
> > should be run when the system call returns to user (regardless of the
> > system call result code).
> > If that doesn't happen Linux is badly broken!
> > >From an application point of view this is exactly the same as the signal
> > occurring just before/after the kernel entry/exit for the system call.
> >
> > The call should just return early with success status.
> > No need to preserve the EINTR response for later.
> >
> > The same might be appropriate for other errors - maybe including EFAULT
> > copying non-initial messages to userspace.
> > Put the message being processed back on the socket queue and return
> > success with the (non-zero) partial message count.
>
> We don't need to put anything back, if we get an EFAULT for a datagram,
> then we stop processing that packet, _dropping_ it (and that is just
> like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
> and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
> and stop the batch, and if no datagrams were received, return the error
> straight away.
>
> But if some datagrams were successfully received, and at that point
> _already_ removed from queues and sent successfully to userspace,
> recvmmsg will return the number of successfully copied datagrams and
> store the error so that it can return on the next syscall,

That just doesn't make any sense.
Saving an errno code would only make any sense if the error were a
property of the socket - but EFAULT is a property of the system call,
and EINTR a property of the process (it exists so that the process
can return to userspace to execute a signal handler - relying on
SIGALRM to timeout blocking system calls is a recipe for disaster).

The next system call could be from an entirely different process,
neither EFAULT nor EINTR would mean anything to it at all.

ISTR that returning EFAULT generates a signal that will typically
terminate the process.
You definitely don't want to send one to a different process.

> Please refer to the original discussion on how to report how many
> successfully copied datagrams and also report that it stopped before the
> timeout and the number of requested datagrams in a batch:
>
> http://lkml.kernel.org/r/200905221022.48790.remi.denis-courmont@xxxxxxxxx

I do remember the original problem.
I don't recall error reporting being referenced.

> What is being discussed here is how to return the EFAULT that may happen
> _after_ datagram processing, be it interrupted by an EFAULT, signal, or
> plain returning all that was requested, with no errors.

I remember some discussions from an XNET standards meeting (I've forgotten
exactly which errors on which calls were being discussed).
My recollection is that you return success with a partial transfer
count for ANY error that happens after some data has been transferred.
The actual error will be returned when it happens again on the next
system call - Note the AGAIN, not a saved error.

Things like blocking send/write being interrupted spring to mind.
Possibly even copyin/out failures part way through a read/write call.

> This EFAULT _after_ datagram processing may happen when updating the
> remaining timeout, because then how can userspace both receive the
> number of successfully copied datagrams (in any of the cases mentioned
> in the previous paragraph) and know that that timeout can't be used
> because there was a problem while trying to copy it to userspace
> (EFAULT)?

Failure to write the control structure back to userspace probably
deserves an EFAULT return - the application is buggy.
IIRC normal recvmsg() copies out the control structure at the end
of processing - that can fail.
I wouldn't worry about datagram discards on any of those late
EFAULT conditions.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/