Re: [PATCH 2/2] workqueue: implement lockup detector

From: Tejun Heo
Date: Thu Dec 03 2015 - 14:44:06 EST


Hello, Don.

On Thu, Dec 03, 2015 at 12:50:24PM -0500, Don Zickus wrote:
> This sort of looks like the hung task detector..
>
> I am a little concerned because we just made a big effort to properly
> separate the hardlockup and softlockup paths and yet retain the flexibility
> to enable/disable them separately. Now it seems the workqueue detector is
> permanently entwined with the softlockup detector. I am not entirely sure
> that is correct thing to do.

The only area they get entwined is how it's controlled from userland.
While it isn't quite the same as softlockup detection, I think what it
monitors is close enough that it makes sense to put them under the
same interface.

> It also seems awkward for the lockup code to have to jump to the workqueue
> code to function properly. :-/ Though we have made exceptions for the virt
> stuff and the workqueue code is simple..

Softlockup code doesn't depend on workqueue in any way. Workqueue
tags on touch_softlockup to detect cases which shouldn't be warned and
its enabledness is controlled together with softlockup and that's it.

> Actually, I am curious, it seems if you just added a
> /proc/sys/kernel/wq_watchdog entry, you could elminiate the entire need for
> modifying the watchdog code to begin with. As you really aren't using any
> of it other than piggybacking on the touch_softlockup_watchdog stuff, which
> could probably be easily added without all the extra enable/disable changes
> in watchdog.c.

Yeah, except for touch signal, it's purely interface thing. I don't
feel too strong about this but it seems a bit silly to introduce a
whole different set of interface for this. e.g. if the user wanted to
disable softlockup detection, it'd be weird to leave wq lockup
detection running. The same goes for threshold.

> Again, this looks like what the hung task detector is doing, which I
> struggled with years ago to integrate with the lockup code because in the
> end I had trouble re-using much of it.

So, it's a stall detector and there are inherent similarities but the
conditions tested are pretty different and it's a lot lighter. I'm
not really sure what you're meaning to say.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/