Re: CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch
From: Petr Mladek
Date: Tue Oct 01 2024 - 04:03:58 EST
On Fri 2024-09-27 14:40:07, Greg Kroah-Hartman wrote:
> Description
> ===========
>
> In the Linux kernel, the following vulnerability has been resolved:
>
> workqueue: Improve scalability of workqueue watchdog touch
>
> On a ~2000 CPU powerpc system, hard lockups have been observed in the
> workqueue code when stop_machine runs (in this case due to CPU hotplug).
I believe that this does not qualify as a security vulnerability.
Any hotplug is a privileged operation.
Best Regards,
Petr
> This is due to lots of CPUs spinning in multi_cpu_stop, calling
> touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
> wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
> and that can find itself in the same cacheline as other important
> workqueue data, which slows down operations to the point of lockups.
>
> In the case of the following abridged trace, worker_pool_idr was in
> the hot line, causing the lockups to always appear at idr_find.
>
> watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
> Call Trace:
> get_work_pool
> __queue_work
> call_timer_fn
> run_timer_softirq
> __do_softirq
> do_softirq_own_stack
> irq_exit
> timer_interrupt
> decrementer_common_virt
> * interrupt: 900 (timer) at multi_cpu_stop
> multi_cpu_stop
> cpu_stopper_thread
> smpboot_thread_fn
> kthread
>
> Fix this by having wq_watchdog_touch() only write to the line if the
> last time a touch was recorded exceeds 1/4 of the watchdog threshold.
>
> The Linux kernel CVE team has assigned CVE-2024-46839 to this issue.
>
>
> Affected and fixed versions
> ===========================
>
> Fixed in 5.15.167 with commit 9d08fce64dd7
> Fixed in 6.1.110 with commit a2abd35e7dc5
> Fixed in 6.6.51 with commit 241bce1c757d
> Fixed in 6.10.10 with commit da5f374103a1
> Fixed in 6.11 with commit 98f887f820c9
>
> Please see https://www.kernel.org for a full list of currently supported
> kernel versions by the kernel community.
>
> Unaffected versions might change over time as fixes are backported to
> older supported kernel versions. The official CVE entry at
> https://cve.org/CVERecord/?id=CVE-2024-46839
> will be updated if fixes are backported, please check that for the most
> up to date information about this issue.
>
>
> Affected files
> ==============
>
> The file(s) affected by this issue are:
> kernel/workqueue.c
>
>
> Mitigation
> ==========
>
> The Linux kernel CVE team recommends that you update to the latest
> stable kernel version for this, and many other bugfixes. Individual
> changes are never tested alone, but rather are part of a larger kernel
> release. Cherry-picking individual commits is not recommended or
> supported by the Linux kernel community at all. If however, updating to
> the latest release is impossible, the individual changes to resolve this
> issue can be found at these commits:
> https://git.kernel.org/stable/c/9d08fce64dd77f42e2361a4818dbc4b50f3c7dad
> https://git.kernel.org/stable/c/a2abd35e7dc55bf9ed01e2b3481fa78e086d3bf4
> https://git.kernel.org/stable/c/241bce1c757d0587721512296952e6bba69631ed
> https://git.kernel.org/stable/c/da5f374103a1e0881bbd35847dc57b04ac155eb0
> https://git.kernel.org/stable/c/98f887f820c993e05a12e8aa816c80b8661d4c87