Re: [PATCH v2] kcm: remove any offset before parsing messages
From: Tom Herbert
Date: Wed Feb 20 2019 - 11:19:06 EST
On Tue, Feb 19, 2019 at 8:12 PM Dominique Martinet
<asmadeus@xxxxxxxxxxxxx> wrote:
>
> Dominique Martinet wrote on Fri, Feb 15, 2019:
> > With all that said I guess my patch should work correctly then, I'll try
> > to find some time to check the error does come back up the tcp socket in
> > my reproducer but I have no reason to believe it doesn't.
>
> Ok, so I can confirm this part - the 'csock' does come back up with
> POLLERR if the parse function returns ENOMEM in the current code.
>
Good.
> It also comes back up with POLLERR when the remote side closes the
> connection, which is expected, but I'm having a very hard time
> understanding how an application is supposed to deal with these
> POLLERR after having read the documentation and a bit of
> experimentation.
> I'm not sure how much it would matter for real life (if the other end
> closes the connection most servers would not care about what they said
> just before closing, but I can imagine some clients doing that in real
> life e.g. a POST request they don't care if it succeeds or not)...
> My test program works like this:
> - client connects, sends 10k messages and close()s the socket
> - server loops recving and close()s after 10k messages; it used to be
> recvmsg() directly but it's now using poll then recvmsg.
>
>
> When the client closes the socket, some messages are obviously still "in
> flight", and the server will recv a POLLERR notification on the csock at
> some point with many messages left.
> The documentation says to unattach the csock when you get POLLER. If I
> do that, the kcm socket will no longer give me any message, so all the
> messages still in flight at the time are lost.
So basically it sounds like you're interested in supporting TCP
connections that are half closed. I believe that the error in half
closed is EPIPE, so if the TCP socket returns that it can be ignored
and the socket can continue being attached and used to send data.
Another possibility is to add some linger semantics to an attached
socket. For instance, a large message might be sent so that part of
the messge is queued in TCP and part is queued in the KCM socket.
Unattach would probably break that message. We probably want to linger
option similar to SO_LINGER (or maybe just use the option on the TCP
socket) that means don't complete the detach until any message being
transmitted on the lower socket has been queued.
>
> If I just ignore the csock like I used to, all the messages do come just
> fine, but as said previously on a real error this will just make recvmsg
> or the polling hang forever and I see no way to distinguish a "real"
> error vs. a connection shut down from the remote side with data left in
> the pipe.
> I thought of checking POLLERR on csock and POLLIN not set on kcmsock,
> but even that seems to happen fairly regularily - the kcm sock hasn't
> been filled up, it's still reading from the csock.
>
>
> On the other hand, checking POLLIN on the csock does say there is still
> data left, so I know there is data left on the csock, but this is also
> the case on a real error (e.g. if parser returns -ENOMEM)
> ... And this made me try to read from the csock after detaching it and I
> can resume manual tcp parsing for a few messages until read() fails with
> EPROTO ?! and I cannot seem to be able to get anything out of attaching
> it back to kcm (for e.g. an ENOMEM error that was transient)...
>
>
>
> I'm honestly not sure how the POLLERR notification mechanism works but I
> think it would be much easier to use KCM if we could somehow delay that
> error until KCM is done feeding from the csock (when netparser really
> stops reading from it like on real error, e.g. abort callback maybe?)
> I think it's fine if the csock is closed before the kcm sock message is
> read, but we should not lose messages like this.
Sounds like linger semantics is needed then.
>
>
>
> > I'd like to see some retry on ENOMEM before this is merged though, so
> > while I'm there I'll resend this with a second patch doing that
> > retry,.. I think just not setting strp->interrupted and not reporting
> > the error up might be enough? Will have to try either way.
>
> I also tried playing with that without much success.
> I had assumed just not calling strp_parser_err() (which calls the
> abort_parser cb) would be enough, eventually calling strp_start_timer()
> like the !len case, but no can do.
I think you need to ignore the ENOMEM and have a timer or other
callback to retry the operation in the future.
> With that said, returning 0 from the parse function also raises POLLERR
> on the csock and hangs netparser, so things aren't that simple...
Can you point to where this is happening. If the parse_msg callback
returns 0 that is suppose to indicate that more bytes are needed.
>
>
> I could use a bit of help again.
>
> Thanks,
> --
> Dominique