Re: [patch 4 14/22] timer: Switch to a non cascading wheel

From: Paul E. McKenney
Date: Fri Aug 12 2016 - 15:14:17 EST


On Fri, Aug 12, 2016 at 01:50:16PM -0400, Rik van Riel wrote:
> On Thu, 2016-08-11 at 18:21 +0300, Jouni Malinen wrote:
> > On Mon, Jul 4, 2016 at 12:50 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > wrote:
> > > The current timer wheel has some drawbacks:
> > ...
> >
> > It looks like this change (commit
> > 500462a9de657f86edaa102f8ab6bff7f7e43fc2 in linux.git) breaks one of
> > the automated test cases I'm using to test hostapd and wpa_supplicant
> > with mac80211_hwsim from the kernel. I'm not sure what exactly causes
> > this (did not really expect git bisect to point to timers..), but
> > this
> > seems to be very reproducible for me under kvm (though, this
> > apparently did not happen on another device, so I'm not completely
> > sure what it is needed to reproduce) with the ap_wps_er_http_proto
> > test cases failing to connect 20 TCP stream sockets to a server on
> > the
> > localhost. The client side is a python test script and the server is
> > hostapd. The failure shows up with about the 13th of those socket
> > connects failing while all others (both before and after this failed
> > one) going through.
> >
> > Would you happen to have any idea why this commit has such a
> > difference in behavior? 
>
> I have a vague hypothesis, more of a question actually.
>
> How does the new timer wheel code handle lost timer ticks?
>
> If a KVM guest does not run for a while, because the host
> is scheduling something else, the guest generally only gets
> one timer tick after the guest is scheduled back in.
>
> If there are multiple lost ticks, they will remain lost.
>
> Could that cause the new timer wheel code to skip over
> timer buckets occasionally, or is this hypothesis bunk?

FWIW, I do appear to be seeing more lost wakeups on current mainline
than on v4.7, but not enough of a difference to get a reliable bisction
in reasonable time.

Thanx, Paul