Re: [PATCH 2/2] workqueue: implement lockup detector

From: Ulrich Obergfell
Date: Fri Dec 04 2015 - 08:20:01 EST



Tejun,

> Sure, separating the knobs out isn't difficult. I still don't like
> the idea of having multiple set of similar knobs controlling about the
> same thing tho.
>
> For example, let's say there's a user who boots with "nosoftlockup"
> explicitly. I'm pretty sure the user wouldn't be intending to keep
> workqueue watchdog running. The same goes for threshold adjustments,
> so here's my question. What are the reasons for the concern? What
> are we worrying about?

I'm not sure if it is obvious to a user that a stall of workqueues is
"about the same thing" as a soft lockup, and that one could thus argue
that both should be controlled by the same knob. Looking at this from
perspective of usability, I would still vote for having separate knobs
for each lockup detector. For example

/proc/sys/kernel/wq_watchdog_thresh

could control the on|off state of the workqueue watchdog and the timeout
at the same time (0 means off, > 0 means on and specifies the timeout).
Separating wq_watchdog_thresh from watchdog_thresh might also be useful
for diagnostic purposes for example, if during the investigation of a
problem one would want to explicitly increase or lower one threshold
without impacting the other.


>> And another question that comes to my mind is: Would the workqueue watchdog
>> participate in the lockup detector suspend/resume mechanism, and if yes, how
>> would it be integrated into this ?
>
> From the usage, I can't quite tell what the purpose of the mechanism
> is. The only user seems to be fixup_ht_bug() and when it fails it
> says "failed to disable PMU erratum BJ122, BV98, HSD29 workaround" so
> if you could give me a pointer, it'd be great. But at any rate, if
> shutting down watchdog is all that's necessary, it shouldn't be a
> problem to integrate.

The patch post that introduced the mechanism is here:

http://marc.info/?l=linux-kernel&m=143843318208917&w=2

The watchdog_{suspend|resume} functions were later renamed:

http://marc.info/?l=linux-kernel&m=143894132129982&w=2

At the moment I don't see a reason why the workqueue watchdog would have to
participate in that mechanism. However, if the workqueue watchdog would be
connected to the soft lockup detector as you proposed, I think it should be
participating for the 'sake of consistency' (it would seem hard to under-
stand if the interface would only suspend parts of the lockup detector).


Regards,

Uli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/