Re: [PREEMPT-RT] [patch 4 14/22] timer: Switch to a non cascading wheel

From: Jouni Malinen
Date: Wed Aug 17 2016 - 05:11:38 EST


On Tue, Aug 16, 2016 at 11:46:00AM +0200, Richard Cochran wrote:
> If I understand the test correctly, then the slightly different kernel
> timer behavior is ok, but the test isn't quite right. Let explain
> what I mean.
>
> First off, reading test_ap_wps.py, the point of the test is to see if
> ten simultaneous connections are possible. I guess the server
> implements a hard coded limit on the number of clients. (BTW where is
> the server loop?)

Yes, I think I wrote that test case to hit a specific code path in the
server side. The server implementation is here:
http://w1.fi/cgit/hostap/plain/src/wps/http_server.c

> You said that the server also sets 'backlog' to ten. The backlog
> controls the size of the queue holding incoming connections that are
> in the SYN_RCVD or ESTABLISHED state but have not yet been
> accept(2)-ed by the server. This is *not* the same as the number of
> possible simultaneous connections.

This is indeed what seems to be the key area that had a small change in
timing. After some more experimentation on various timing, it is
starting to look like the TCP initiator side retransmission timing is
now different after this kernel commit and that is hit for the case
where the server process does not have enough time to accept() the
incoming connections and listen() backlog ends up dropping it instead.

> On Sat, Aug 13, 2016 at 12:12:26PM +0300, Jouni Malinen wrote:
> > Yes, it looks like a TCP connect() timeout. I use a significantly
> > reduced timeout in the test scripts since they are run unattended and
> > are supposed to terminate in reasonable amount of time.. That said,
>
> I did not find where the client sets the one second timeout. Where
> does this happen?

There is a socket.setdefaulttimeout(1) call in wps_er_start().

> > If I increase that 20 to 50, I get more of such about 1.03 second
> > results at i=17, i=34, i=48..
>
> Can you provide the timings when the test runs on the older kernel?

I had not realized this previously due to the test case passing, but the
same retransmit SYN case was happening with older kernels, it just was
done a tiny bit faster to escape that 1.0 second timeout limit.. That
about 1.03 sec value after this kernel commit is 1.0 sec before this
kernel commit. In other words, something in this specific kernel commit
seems to add about 0.03 sec delay to the TCP SYN retransmission. That
said, I realize that this is quite unlikely timeout to use for connect()
in real world and as such, it looks simply as a side effect of a test
case that was using way too close timing requirement in the first place.

> > Looking more at what exactly is happening at the TCP layer, this is
> > likely related to the server behavior since listen() backlog is set to
> > 10 and if there are 10 parallel connections, the last one if
> > immediately closed before reading anything.
>
> To clarify, when the backlog is exceed, the new connection is not
> closed. Instead, the SYN is simply ignored, and the client is expect
> to re-transmit the SYN in the normal TCP fashion.

That closing was referring to what the server code in http_server.c
does, i.e., it does call close() after accept()ing the 10th parallel
connection. I had not realized that this was actually hitting the
listen() backlog in this specific case around the 15th socket, but yes,
that does indeed explain the behavior shown in the sniffer capture.

I was also able to confirm that if I add a short wait in the test loop
between each connect() call, things work fine with this kernel commit
included, i.e., the "issue" was triggered only when this listen()
backlog case was hit.

> Here is what I suspect is happening. By sending 20 SYN frames to a
> port with a backlog of 10, it saturates the queue. One SYN is ignored
> by the kernel, and a race begins between the connect() timeout and the
> SYN re-transmission. If the client's re-transmitted SYN and then the
> server's SYN,ACK returns before the connect timeout, then the call to
> connect() succeeds. With the new timer wheel, the result of the race
> is different.

Yes, that does indeed look exactly what is happening.

> There a couple of ways to deal with this. One is to increase the
> backlog on the server side. Another is to increase the connect()
> timeout to a multiple of the re-transmission interval.

Taken into account this is a test case for testing a specific code path
in http_server.c that is very unlikely to be hit in real world and a way
too short timeout for connect(), it is clear to me that the proper fix
here is to fix the test case to use a longer connect() timeout
especially now that I saw the 1.0 sec time with older kernel as well.

--
Jouni Malinen PGP id EFC895FA