[PATCH v2] sched_ext: Replace rq_lock() to raw_spin_rq_lock() in scx_ops_bypass()

From: Changwoo Min
Date: Wed Jan 08 2025 - 03:33:33 EST


scx_ops_bypass() iterates all CPUs to re-enqueue all the scx tasks.
For each CPU, it acquires a lock using rq_lock() regardless of whether
a CPU is offline or the CPU is currently running a task in a higher
scheduler class (e.g., deadline). The rq_lock() is supposed to be used
for online CPUs, and the use of rq_lock() may trigger an unnecessary
warning in rq_pin_lock(). Therefore, replace rq_lock() to
raw_spin_rq_lock() in scx_ops_bypass().

This change fixes: 0e7ffff1b811 ("scx: Fix raciness in scx_ops_bypass()")

Without this change, we observe the following warnings:

===== START =====
[ 6.615204] ------------[ cut here ]------------
[ 6.615205] rq->balance_callback && rq->balance_callback != &balance_push_callback
[ 6.615208] WARNING: CPU: 2 PID: 0 at kernel/sched/sched.h:1730 __schedule+0x1130/0x1c90
[ 6.615214] Modules linked in: nf_tables vfat fat intel_rapl_msr amd_atl intel_rapl_common kvm_amd snd_hda_codec_realtek snd_hda_scodec_component kvm snd_hda_codec_generic crct10dif_pclmul crc32_pclmul polyval_clmulni snd_hda_intel polyval_generic ghash_clmulni_intel snd_intel_dspcfg eeepc_wmi snd_usb_audio snd_intel_sdw_acpi sha512_ssse3 sha1_ssse3 asus_wmi snd_hda_codec aesni_intel snd_usbmidi_lib ee1004 platform_profile gf128mul snd_ump asus_ec_sensors snd_hda_core i8042 crypto_simd snd_rawmidi sparse_keymap snd_hwdep snd_seq_device cryptd serio rapl rfkill snd_pcm wmi_bmof pcspkr k10temp snd_timer i2c_piix4 snd i2c_smbus soundcore ccp mc igc mousedev ptp joydev pps_core leetmouse(OE) mac_hid tcp_bbr pkcs8_key_parser ntsync(OE) i2c_dev crypto_user dm_mod loop nfnetlink lz4 zram 842_decompress 842_compress lz4hc_compress lz4_compress ip_tables x_tables btrfs libcrc32c crc32c_generic raid6_pq xor crc32c_intel nvme sha256_ssse3 nvme_core nvme_auth nvidia_drm(OE) drm_ttm_helper ttm hid_cmedia nvidia_uvm(OE)
[ 6.615294] nvidia_modeset(OE) hid_generic mxm_wmi video wmi usbhid nvidia(OE)
[ 6.615302] CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Tainted: G OE 6.12.6-2-cachyos #1 c963cd2b82aa9cdd05160d5f7838a69b51110706
[ 6.615307] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 6.615308] Hardware name: System manufacturer System Product Name/ROG STRIX X570-E GAMING, BIOS 5013 03/18/2024
[ 6.615310] Sched_ext: lavd (enabling+all)
[ 6.615311] RIP: 0010:__schedule+0x1130/0x1c90
[ 6.615314] Code: 90 56 65 94 0f 84 e1 ef ff ff f6 05 4a 78 3d 01 01 0f 85 d4 ef ff ff c6 05 3d 78 3d 01 01 48 c7 c7 8b a3 cd 93 e8 90 24 0e ff <0f> 0b 41 8b 86 38 0c 00 00 e9 b3 ef ff ff e8 bd 8c ff ff 65 ff 0d
[ 6.615316] RSP: 0018:ffffb23e4019fe28 EFLAGS: 00010046
[ 6.615319] RAX: e9cdb54dc06b0200 RBX: ffffa02e00a93680 RCX: 0000000000000027
[ 6.615320] RDX: ffffb23e4019fc90 RSI: 00000000ffffefff RDI: ffffa0350eb21948
[ 6.615322] RBP: ffffb23e4019fee0 R08: 0000000000000000 R09: ffffffff9465a840
[ 6.615323] R10: 0000000000002ffd R11: 0000000000000004 R12: 0000000000000000
[ 6.615325] R13: ffffa02e00a93680 R14: ffffa0350eb366c0 R15: 00000000ffffffff
[ 6.615327] FS: 0000000000000000(0000) GS:ffffa0350eb00000(0000) knlGS:0000000000000000
[ 6.615329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6.615331] CR2: 00007cb2b7100008 CR3: 000000012222a000 CR4: 0000000000f50ef0
[ 6.615333] PKRU: 55555554
[ 6.615334] Call Trace:
[ 6.615336] <TASK>
[ 6.615338] ? __warn+0xd5/0x1d0
[ 6.615341] ? __schedule+0x1130/0x1c90
[ 6.615345] ? report_bug+0x144/0x1f0
[ 6.615348] ? __schedule+0x1130/0x1c90
[ 6.615350] ? handle_bug+0x6a/0x90
[ 6.615353] ? exc_invalid_op+0x1a/0x50
[ 6.615356] ? asm_exc_invalid_op+0x1a/0x20
[ 6.615361] ? __schedule+0x1130/0x1c90
[ 6.615363] ? __schedule+0x1130/0x1c90
[ 6.615366] ? pv_native_safe_halt+0x13/0x20
[ 6.615369] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615372] ? ct_kernel_enter+0x2e/0x90
[ 6.615374] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615376] ? local_clock_noinstr+0xc/0xc0
[ 6.615380] schedule_idle+0x23/0x40
[ 6.615382] cpu_startup_entry+0x1c2/0x250
[ 6.615386] start_secondary+0x9e/0xa0
[ 6.615389] common_startup_64+0x13e/0x140
[ 6.615395] </TASK>
[ 6.615396] ---[ end trace 0000000000000000 ]---
[ 6.615398] ------------[ cut here ]------------
[ 6.615401] rq->balance_callback && rq->balance_callback != &balance_push_callback
[ 6.615403] WARNING: CPU: 6 PID: 2269 at kernel/sched/sched.h:1730 scx_ops_bypass+0x178/0x240
[ 6.615408] Modules linked in: nf_tables vfat fat intel_rapl_msr amd_atl intel_rapl_common kvm_amd snd_hda_codec_realtek snd_hda_scodec_component kvm snd_hda_codec_generic crct10dif_pclmul crc32_pclmul polyval_clmulni snd_hda_intel polyval_generic ghash_clmulni_intel snd_intel_dspcfg eeepc_wmi snd_usb_audio snd_intel_sdw_acpi sha512_ssse3 sha1_ssse3 asus_wmi snd_hda_codec aesni_intel snd_usbmidi_lib ee1004 platform_profile gf128mul snd_ump asus_ec_sensors snd_hda_core i8042 crypto_simd snd_rawmidi sparse_keymap snd_hwdep snd_seq_device cryptd serio rapl rfkill snd_pcm wmi_bmof pcspkr k10temp snd_timer i2c_piix4 snd i2c_smbus soundcore ccp mc igc mousedev ptp joydev pps_core leetmouse(OE) mac_hid tcp_bbr pkcs8_key_parser ntsync(OE) i2c_dev crypto_user dm_mod loop nfnetlink lz4 zram 842_decompress 842_compress lz4hc_compress lz4_compress ip_tables x_tables btrfs libcrc32c crc32c_generic raid6_pq xor crc32c_intel nvme sha256_ssse3 nvme_core nvme_auth nvidia_drm(OE) drm_ttm_helper ttm hid_cmedia nvidia_uvm(OE)
[ 6.615482] nvidia_modeset(OE) hid_generic mxm_wmi video wmi usbhid nvidia(OE)
[ 6.615490] CPU: 6 UID: 0 PID: 2269 Comm: scx_lavd Tainted: G W OE 6.12.6-2-cachyos #1 c963cd2b82aa9cdd05160d5f7838a69b51110706
[ 6.615494] Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 6.615495] Hardware name: System manufacturer System Product Name/ROG STRIX X570-E GAMING, BIOS 5013 03/18/2024
[ 6.615497] Sched_ext: lavd (enabling+all), task: runnable_at=+0ms
[ 6.615498] RIP: 0010:scx_ops_bypass+0x178/0x240
[ 6.615501] Code: eb 42 0f 1f 44 00 00 4c 89 ef e8 c3 fd d1 00 49 ff c4 e9 5b ff ff ff c6 05 9d dc 0e 02 01 48 c7 c7 8b a3 cd 93 e8 b8 88 df ff <0f> 0b eb a5 0f 0b 41 8b 85 6c 0a 00 00 eb a9 0f 0b 41 8b 85 6c 0a
[ 6.615503] RSP: 0018:ffffb23e619479e8 EFLAGS: 00010046
[ 6.615506] RAX: 3730614603e1d700 RBX: 0000000000000000 RCX: 0000000000000027
[ 6.615507] RDX: ffffb23e61947850 RSI: 00000000ffffefff RDI: ffffa0350ed21948
[ 6.615509] RBP: ffffa0350eb00000 R08: 0000000000000000 R09: ffffffff9465a840
[ 6.615511] R10: 0000000000002ffd R11: 0000000000000004 R12: 0000000000000002
[ 6.615512] R13: ffffa0350eb366c0 R14: 0000000000000286 R15: ffffb23e619479f0
[ 6.615514] FS: 0000703f64c53880(0000) GS:ffffa0350ed00000(0000) knlGS:0000000000000000
[ 6.615516] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6.615517] CR2: 00005b2a1d0d23d0 CR3: 000000011089a000 CR4: 0000000000f50ef0
[ 6.615519] PKRU: 55555554
[ 6.615520] Call Trace:
[ 6.615522] <TASK>
[ 6.615524] ? __warn+0xd5/0x1d0
[ 6.615527] ? scx_ops_bypass+0x178/0x240
[ 6.615530] ? report_bug+0x144/0x1f0
[ 6.615533] ? scx_ops_bypass+0x178/0x240
[ 6.615536] ? handle_bug+0x6a/0x90
[ 6.615538] ? exc_invalid_op+0x1a/0x50
[ 6.615541] ? asm_exc_invalid_op+0x1a/0x20
[ 6.615545] ? scx_ops_bypass+0x178/0x240
[ 6.615548] ? scx_ops_bypass+0x178/0x240
[ 6.615551] bpf_scx_reg+0xfb5/0x1380
[ 6.615559] bpf_struct_ops_link_create+0x13c/0x190
[ 6.615563] __sys_bpf+0x765/0x6080
[ 6.615567] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615570] ? syscall_exit_to_user_mode+0x38/0xc0
[ 6.615573] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615578] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615580] ? arch_exit_to_user_mode_prepare.cold+0x5/0x5c
[ 6.615583] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615585] ? syscall_exit_to_user_mode+0x38/0xc0
[ 6.615587] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615589] ? do_syscall_64+0x9b/0x170
[ 6.615592] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615594] ? syscall_exit_to_user_mode+0x38/0xc0
[ 6.615596] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615598] ? do_syscall_64+0x9b/0x170
[ 6.615601] ? __se_sys_close.llvm.4416965578177173658+0x6d/0xa0
[ 6.615604] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615606] ? kmem_cache_free.cold+0x138/0x32a
[ 6.615610] __x64_sys_bpf+0x1c/0x30
[ 6.615613] do_syscall_64+0x8f/0x170
[ 6.615615] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615617] ? do_syscall_64+0x9b/0x170
[ 6.615621] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615624] ? arch_exit_to_user_mode_prepare.cold+0x5/0x5c
[ 6.615626] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615628] ? syscall_exit_to_user_mode+0x38/0xc0
[ 6.615631] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615633] ? do_syscall_64+0x9b/0x170
[ 6.615635] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615637] ? arch_exit_to_user_mode_prepare.cold+0x5/0x5c
[ 6.615640] ? srso_alias_return_thunk+0x5/0xfbef5
[ 6.615643] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 6.615645] RIP: 0033:0x703f64e8315d
[ 6.615656] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9b 6b 0d 00 f7 d8 64 89 01 48
[ 6.615658] RSP: 002b:00007ffebaea0a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[ 6.615661] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000703f64e8315d
[ 6.615662] RDX: 0000000000000040 RSI: 00007ffebaea0a70 RDI: 000000000000001c
[ 6.615664] RBP: 00007ffebaea0b40 R08: 000000000000000f R09: 0000000000000000
[ 6.615665] R10: 000000000000000f R11: 0000000000000246 R12: 000000000000000f
[ 6.615667] R13: 000000000000002c R14: 0000000000000010 R15: 0000703f650d8000
[ 6.615671] </TASK>
[ 6.615673] ---[ end trace 0000000000000000 ]---
[ 6.615712] sched_ext: BPF scheduler "lavd" enabled
[ 6.623157] sched_ext: kworker/1:0[29] has zero slice in pick_task_scx()
===== END =====

Signed-off-by: Changwoo Min <changwoo@xxxxxxxxxx>
---
kernel/sched/ext.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 8fe64c27004e..cb6eb49d16be 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4803,10 +4803,9 @@ static void scx_ops_bypass(bool bypass)
*/
for_each_possible_cpu(cpu) {
struct rq *rq = cpu_rq(cpu);
- struct rq_flags rf;
struct task_struct *p, *n;

- rq_lock(rq, &rf);
+ raw_spin_rq_lock(rq);

if (bypass) {
WARN_ON_ONCE(rq->scx.flags & SCX_RQ_BYPASSING);
@@ -4822,7 +4821,7 @@ static void scx_ops_bypass(bool bypass)
* sees scx_rq_bypassing() before moving tasks to SCX.
*/
if (!scx_enabled()) {
- rq_unlock(rq, &rf);
+ raw_spin_rq_unlock(rq);
continue;
}

@@ -4842,10 +4841,11 @@ static void scx_ops_bypass(bool bypass)
sched_enq_and_set_task(&ctx);
}

- rq_unlock(rq, &rf);
-
/* resched to restore ticks and idle state */
- resched_cpu(cpu);
+ if (cpu_online(cpu) || cpu == smp_processor_id())
+ resched_curr(rq);
+
+ raw_spin_rq_unlock(rq);
}

atomic_dec(&scx_ops_breather_depth);
--
2.47.1