Re: [patch 4 14/22] timer: Switch to a non cascading wheel

From: Jouni Malinen
Date: Thu Aug 11 2016 - 11:21:36 EST


On Mon, Jul 4, 2016 at 12:50 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> The current timer wheel has some drawbacks:
...

It looks like this change (commit
500462a9de657f86edaa102f8ab6bff7f7e43fc2 in linux.git) breaks one of
the automated test cases I'm using to test hostapd and wpa_supplicant
with mac80211_hwsim from the kernel. I'm not sure what exactly causes
this (did not really expect git bisect to point to timers..), but this
seems to be very reproducible for me under kvm (though, this
apparently did not happen on another device, so I'm not completely
sure what it is needed to reproduce) with the ap_wps_er_http_proto
test cases failing to connect 20 TCP stream sockets to a server on the
localhost. The client side is a python test script and the server is
hostapd. The failure shows up with about the 13th of those socket
connects failing while all others (both before and after this failed
one) going through.

Would you happen to have any idea why this commit has such a
difference in behavior? I'm currently working around this in my test
script with the following change, but it might be worth while to
confirm whether there is something in the kernel change that resulted
in unexpected behavior.

http://w1.fi/cgit/hostap/commit/?id=2d6a526ac3885605f34df4037fc79ad330565b23

The test code looked like this in python:

addr = (url.hostname, url.port)
socks = {}
for i in range(20):
socks[i] = socket.socket(socket.AF_INET, socket.SOCK_STREAM,
socket.IPPROTO_TCP)
socks[i].connect(addr)

With that connect() call being the failing (time out) operation and it
seemed to happen for i == 13 most of the time. This shows up only with
commit 500462a9de657f86edaa102f8ab6bff7f7e43fc2 included in the kernel
(i.e., test with commit b0d6e2dcb284f1f4dcb4b92760f49eeaf5fc0bc7 as
the kernel snapshot does not show this behavior). Changes in 500462a9
were not trivial to revert on top of the current master, so I have not
checked whether the current master branch would get rid of the failure
if only this one commit were reverted.

I can reproduce this easily, so if someone wants to get more details
of the issue, just let me know how to collect whatever would be
useful.

- Jouni