Re: stop_machine lockup issue in 3.9.y.

From: Ben Greear
Date: Wed Jun 05 2013 - 23:51:16 EST

On 06/05/2013 08:46 PM, Eric Dumazet wrote:
On Wed, 2013-06-05 at 20:41 -0700, Ben Greear wrote:
On 06/05/2013 08:26 PM, Eric Dumazet wrote:
On Wed, 2013-06-05 at 20:14 -0700, Tejun Heo wrote:

Ah, so, that's why it's showing up now. We probably have had the same
issue all along but it used to be masked by the softirq limiting. Do
you care to revive the 10 iterations limit so that it's limited by
both the count and timing? We do wanna find out why softirq is
spinning indefinitely tho.

Yes, no problem, I can do that.

Limiting it to 5000 fixes my problem, so if you wanted it larger than 10, that would
be fine by me.

I can send a version of my patch easily enough if we can agree on the max number of
loops (and if indeed my version of the patch is acceptable).

Well, 10 was the prior limit and seems really fine.

The non update on jiffies seems quite exceptional condition (I hope...)

We use in Google a patch triggering warning is a thread holds the cpu
without taking care to need_resched() for more than xx ms

Well, I'm sure that patch works nicely until the clock stops moving
forward :)

I'll post a patch with limit of 10 shortly.


Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc

