RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

From: David Laight
Date: Thu May 29 2014 - 10:41:20 EST


From: 'Arnaldo Carvalho de Melo'
> Em Thu, May 29, 2014 at 02:06:04PM +0000, David Laight escreveu:
> > From: 'Arnaldo Carvalho de Melo'
> > ...
> > > > I remember some discussions from an XNET standards meeting (I've forgotten
> > > > exactly which errors on which calls were being discussed).
> > > > My recollection is that you return success with a partial transfer
> > > > count for ANY error that happens after some data has been transferred.
> > > > The actual error will be returned when it happens again on the next
> > > > system call - Note the AGAIN, not a saved error.
>
> > > A saved error, for the right entity, in the recvmmsg case, that
> > > basically is batching multiple recvmsg syscalls, doesn't sound like a
> > > problem, i.e. the idea is to, as much as possible, mimic what multiple
> > > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
> > > subsystems) overhead.
>
> > > Perhaps we can have something in between, i.e. for things like EFAULT,
> > > we should report straight away, effectively dropping whatever datagrams
> > > successfully received in the current batch, do you agree?
>
> > Not unreasonable - EFAULT shouldn't happen unless the application
> > is buggy.
>
> Ok.
>
> > > For transient errors the existing mechanism, fixed so that only per
> > > socket errors are saved for later, as today, could be kept?
>
> > I don't think it is ever necessary to save an errno value for the
> > next system call at all.
> > Just process the next system call and see what happens.
>
> > If the call returns with less than the maximum number of datagrams
> > and with a non-zero timeout left - then the application can infer
> > that it was terminated by an abnormal event of some kind.
> > This might be a signal.
>
> Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
> error on the next call, but we provide a way for the app to retrieve the
> reason for the smaller than expected batch?

If you really think it is necessary, then you want a field in the
control structure.
But IMHO returning the 'time left' is more than enough.

IIRC the original problem was that the user-specified timeout
was used as an inter-datagram timer instead of an overall timeout.

I suspect that most application won't actually care about the
'time left', nor the actual number of returned datagrams.
They will just process what they are given and then wait for
the next batch.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/