Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

From: Michael Kerrisk (man-pages)
Date: Thu May 29 2014 - 10:07:29 EST


On 05/29/2014 12:53 PM, David Laight wrote:
> From: 'Arnaldo Carvalho de
> ...
>>>> So, yes, the user _can_ process the packets already copied to userspace,
>>>> i.e. no packet loss, and then, on the next call, will receive the signal
>>>> notification.
>>
>>> The application shouldn't need to see an EINTR response, any signal handler
>>> should be run when the system call returns to user (regardless of the
>>> system call result code).
>>> If that doesn't happen Linux is badly broken!
>>> >From an application point of view this is exactly the same as the signal
>>> occurring just before/after the kernel entry/exit for the system call.
>>>
>>> The call should just return early with success status.
>>> No need to preserve the EINTR response for later.
>>>
>>> The same might be appropriate for other errors - maybe including EFAULT
>>> copying non-initial messages to userspace.
>>> Put the message being processed back on the socket queue and return
>>> success with the (non-zero) partial message count.
>>
>> We don't need to put anything back, if we get an EFAULT for a datagram,
>> then we stop processing that packet, _dropping_ it (and that is just
>> like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
>> and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
>> and stop the batch, and if no datagrams were received, return the error
>> straight away.
>>
>> But if some datagrams were successfully received, and at that point
>> _already_ removed from queues and sent successfully to userspace,
>> recvmmsg will return the number of successfully copied datagrams and
>> store the error so that it can return on the next syscall,
>
> That just doesn't make any sense.

Agreed.

> Saving an errno code would only make any sense if the error were a
> property of the socket -

Back in http://marc.info/?l=linux-netdev&m=124298156121906&w=2
(the follow-on from the discussion that Arnaldo mentions below),
it was noted:

: Normally you'd expect the call to return what it has read without an
: error, and then the socket error would be picked up on the next call.

and the key point in that sentence was "*socket* error."

> but EFAULT is a property of the system call,
> and EINTR a property of the process (it exists so that the process
> can return to userspace to execute a signal handler - relying on
> SIGALRM to timeout blocking system calls is a recipe for disaster).

Exactly. Interruption by a signal should just result in an early
success return, unless no datagrams have been received so far, in
which case it should produce an EINTR failure. No error should be
saved for a future call.

> The next system call could be from an entirely different process,
> neither EFAULT nor EINTR would mean anything to it at all.
>
> ISTR that returning EFAULT generates a signal that will typically
> terminate the process.

Not generally, I think. (I think you're thinking of SIGSEGV when
a process touches a nonexistent address in user mode.)

> You definitely don't want to send one to a different process.

But it's true that the EFAULT or EINTR shouldn't be returned
to another process.

>> Please refer to the original discussion on how to report how many
>> successfully copied datagrams and also report that it stopped before the
>> timeout and the number of requested datagrams in a batch:
>>
>> http://lkml.kernel.org/r/200905221022.48790.remi.denis-courmont@xxxxxxxxx
>
> I do remember the original problem.
> I don't recall error reporting being referenced.

(See above.)

>> What is being discussed here is how to return the EFAULT that may happen
>> _after_ datagram processing, be it interrupted by an EFAULT, signal, or
>> plain returning all that was requested, with no errors.
>
> I remember some discussions from an XNET standards meeting (I've forgotten
> exactly which errors on which calls were being discussed).
> My recollection is that you return success with a partial transfer
> count for ANY error that happens after some data has been transferred.
> The actual error will be returned when it happens again on the next
> system call - Note the AGAIN, not a saved error.
>
> Things like blocking send/write being interrupted spring to mind.
> Possibly even copyin/out failures part way through a read/write call.
>
>> This EFAULT _after_ datagram processing may happen when updating the
>> remaining timeout, because then how can userspace both receive the
>> number of successfully copied datagrams (in any of the cases mentioned
>> in the previous paragraph) and know that that timeout can't be used
>> because there was a problem while trying to copy it to userspace
>> (EFAULT)?
>
> Failure to write the control structure back to userspace probably
> deserves an EFAULT return - the application is buggy.
> IIRC normal recvmsg() copies out the control structure at the end
> of processing - that can fail.
> I wouldn't worry about datagram discards on any of those late
> EFAULT conditions.

Agree on all of the above, and that last point certainly seems
like the right approach to me.

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/