Re: [PATCH] hung_task: Skip scan on idle systems
From: Petr Mladek
Date: Mon Feb 02 2026 - 09:00:19 EST
On Mon 2026-01-26 15:14:27, Aaron Tomlin wrote:
> On Mon, Jan 26, 2026 at 01:23:01PM +0800, Lance Yang wrote:
> > Hi Aaron,
>
> Hi Lance,
>
> > Keep one patch or series under review at a time, especially in the
> > same subsystem ...
+1 :-)
> Understood. That's fair.
>
> > > @@ -503,6 +504,7 @@ static int watchdog(void *dummy)
> > > for ( ; ; ) {
> > > unsigned long timeout = sysctl_hung_task_timeout_secs;
> > > unsigned long interval = sysctl_hung_task_check_interval_secs;
> > > + unsigned long load[3];
> > > long t;
> > > if (interval == 0)
> > > @@ -511,8 +513,12 @@ static int watchdog(void *dummy)
> > > t = hung_timeout_jiffies(hung_last_checked, interval);
> > > if (t <= 0) {
> > > if (!atomic_xchg(&reset_hung_task, 0) &&
> > > - !hung_detector_suspended)
> > > - check_hung_uninterruptible_tasks(timeout);
> > > + !hung_detector_suspended) {
> > > + /* Check 1-min load to detect idle system */
> > > + get_avenrun(load, 0, 0);
> > > + if (load[0] > 0)
> > > + check_hung_uninterruptible_tasks(timeout);
> >
> > The optimization is not worth the trouble.
> >
> > I don't think the assumption that "load[0] == 0 means no hung tasks" is
> > 100% correct.
> >
> > So that would miss actual hung tasks - a false negative, which is worse
> > than the "wasted scan" you're trying to avoid.
> >
> > Also, I don't *really* care about optimizing something that runs once
> > every 120 seconds :)
> >
> > Nacked-by: Lance Yang <lance.yang@xxxxxxxxx>
>
> Yes, please ignore. This is indeed wrong.
>
> Regarding the value of the optimisation, while a 120-second interval
> implies a low frequency, the cost of the scan is O(N). On large servers
> with high thread counts (even if idle), iterating the entire task list
> dirties cache lines and consumes memory bandwidth unnecessarily.
>
> Nevertheless, we currently do not have a way to economically compute the
> total number of tasks in TASK_UNINTERRUPTIBLE state.
It makes some sense. And the check of the average load is trivial
so it might be acceptable.
But I somehow doubt that it works. Have you ever seen a system with
(avenrun[0] == 0)? IMHO, it might be pretty hard to achieve it.
Or maybe I am too pessimistic. Or are there embedded systems which can
only be waken by some interrupt from a sensor? Do embedded systems
run hung task detector?
By other words. Is this patch solving a theoretical scenario?
Did you test it in practice, please?
Best Regards,
Petr