Re: [PATCH] mm/vmstat: reject zero vm.stat_interval to prevent busy-loop

From: Vlastimil Babka (SUSE)

Date: Wed Mar 04 2026 - 07:38:38 EST

On 3/4/26 10:27 AM, Michal Hocko wrote:
> On Wed 04-03-26 08:27:38, Maximilian Pezzullo via B4 Relay wrote:
>> From: Maximilian Pezzullo <maximilianpezzullo@xxxxxxxxx>
>>
>> Setting vm.stat_interval to 0 causes excessive kworker CPU usage
>> because vmstat_shepherd() and vmstat_update() reschedule themselves
>> with round_jiffies_relative(0), which resolves to an immediate
>> reschedule and creates a busy-loop.
>>
>> Add a custom sysctl handler that rejects 0 and restores the previous
>> value, similar to how dirtytime_interval_handler() handles
>> vm.dirtytime_expire_seconds.
>>
>> Reported-by: Terry M <terrym3201@xxxxxxxxxxxxxxxxxxx>
>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220226
>> Signed-off-by: Maximilian Pezzullo <maximilianpezzullo@xxxxxxxxx>
>> ---
>> mm/vmstat.c | 14 +++++++++++++-
>> 1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 86b14b0f77b5..6eeb4341b215 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -2114,6 +2114,18 @@ void vmstat_flush_workqueue(void)
>> flush_workqueue(mm_percpu_wq);
>> }
>>
>> +static int vmstat_interval_handler(const struct ctl_table *table, int write,
>> + void *buffer, size_t *lenp, loff_t *ppos)
>> +{
>> + int ret = proc_dointvec_jiffies(table, write, buffer, lenp, ppos);
>> +
>> + if (ret == 0 && write && sysctl_stat_interval == 0) {
>> + sysctl_stat_interval = HZ;
>> + return -EINVAL;
>
> So you update the value and report the failure. Nope, this is not
> correct way to handle that. Either tou check the value and fail before
> any side effects or you correct the value, report that to the log and
> return success.
>
> I would preffer to not do that at all. Setting any arbitrary small value
> will have some side effects. This is admin only interface and we expect
> those do know what they are doing.

I think it would make some sense to reject a value that leads to
unrecoverable situation due to something in the kernel looping endlessly
with no preemption, causing e.g. softlockups, and making it impossible
to set a new sane value again. I don't know if that's the case here for
value 0. If it only leads to 20-30% cpu utilization (per the bugzilla)
and the admin can recover by setting a sane value again, we can indeed
leave it that way.

> NAK to the patch