[REGRESSION] sched/deadline: Hard lockup during CPU offline after commit 14a857056466

From: batcain

Date: Fri May 15 2026 - 23:07:41 EST


[1.] One line summary of the problem:sched/deadline: Hard lockup during CPU offline/migration due to frozen rq_clock loop in update_dl_revised_wakeup()

[2.] Full description of the problem/report:
A deterministic hard lockup occurs during CPU hotplug (offlining a secondary core) on stable kernels containing commit 14a857056466be9d3d907a94e92a704ac1be149b.

When a CPU core is set offline, tasks are migrated within the stop_machine() context where local interrupts are fully disabled (irqs_disabled()). During task migration, enqueue_task_dl() calls update_dl_entity(). Because of the new dl_defer rule introduced for implicit dl_servers, the code is forced into the update_dl_revised_wakeup() branch.

Inside update_dl_revised_wakeup(), the logic depends on rq_clock(rq) to calculate laxity:
u64 laxity = dl_se->deadline - rq_clock(rq);

However, under the stop_machine() noirq phase, the runqueue clock is stale/frozen. Since the clock does not progress across iterations within the enqueue loop, the mathematical state stalls. Consequently, dl_entity_overflow() continuously evaluates to true, trapping the processor core in an infinite loop inside the enqueue path, resulting in a system-wide hard lockup.

[3.] Keywords (keywords of the affected subsystem):
sched, deadline, dl_server, cpuhp, hotplug, hard-lockup, regression

[4.] Kernel information (output of "uname -a" or version):
Linux workstation 7.0.7-hardened2-1-hardened #1 SMP PREEMPT_DYNAMIC Fri, 15 May 2026 00:03:13 +0000 x86_64 GNU/Linux

[5.] Most recent kernel version which did not have the bug:
Any kernel release prior to the integration/backport of commit 14a857056466.

[6.] Output of Oops/Panic/Bug/Objdump:
No native kernel oops/panic stack trace is written to disk/serial because the freeze occurs inside stop_machine() with interrupts masked. NMI watchdog triggers a hard lockup panic if aggressively armed.

[7.] A small program which triggers the problem:
# echo 0 > /sys/devices/system/cpu/cpu1/online

[8.] Environment description (Hardware, distribution, etc.):
Hardware: Confirmed on both AMD Zen 2 (Renoir) and AMD Zen 4 (Phoenix) platforms.
Distribution: Arch Linux (using official extra/linux-hardened kernel package).