On Fri, 26 May 2017, Haris Okanovic wrote:
Oh crap. I think I see the problem. I decrement expired_count before
processing the list. Dropping the lock permits another run of
tick_find_expired()->find_expired_timers() in the middle of __expire_timers()
since it uses expired_count==0 as a condition.
This should fix it, but I'll wait for Anna-Maria's test next week before
submitting a patch.
static void expire_timers(struct timer_base *base)
{
struct hlist_head *head;
+ int expCount = base->expired_count;
No camel case for heavens sake!
And this requires:
cnt = READ_ONCE(base->expired_count);
- while (base->expired_count--) {
- head = base->expired_lists + base->expired_count;
+ while (expCount--) {
+ head = base->expired_lists + expCount;
__expire_timers(base, head);
}
Plus a comment.
base->expired_count = 0;
Anna-Maria spotted the same issue, but I voted for the revert right now
because I was worried about the consistency of base->clk under all
circumstances.
The other thing I noticed was this weird condition which does not do the
look ahead when base->clk is back for some time.
Why don't you use the
existing optimization which uses the bitmap for fast forward?
The other issue I have is that this can race at all. If you raised the
softirq in the look ahead then you should not go into that function until
the softirq has actually completed. There is no point in wasting time in
the hrtimer interrupt if the softirq is running anyway.
Thanks,
tglx