[RFC] debugfs: sched/migration_cost_ns should accept -1 like legacy sysctl interface?
From: Xia Fukun
Date: Mon Apr 27 2026 - 09:18:16 EST
Hi all,
I noticed an inconsistency in how the scheduler’s migration_cost_ns tunable is exposed
via debugfs versus its original sysctl/proc interface.
Currently, the file /sys/kernel/debug/sched/migration_cost_ns is created with
debugfs_create_u32(). This means that writing -1 to it fails with “Invalid argument”,
because u32 attributes reject negative values.
However, the scheduler logic in task_hot() (in kernel/sched/fair.c) explicitly checks
for -1:
static int task_hot(struct task_struct *p, struct lb_env *env)
{
...
if (sysctl_sched_migration_cost == -1)
return 1;
/*
* Don't migrate task if the task's cookie does not match
* with the destination CPU's core cookie.
*/
if (!sched_core_cookie_match(cpu_rq(env->dst_cpu), p))
return 1;
if (sysctl_sched_migration_cost == 0)
return 0;
delta = rq_clock_task(env->src_rq) - p->se.exec_start;
return delta < (s64)sysctl_sched_migration_cost;
}
In kernels prior to v5.10 (when this tunable was still exposed via
/proc/sys/kernel/sched_migration_cost_ns using proc_dointvec()), writing -1 was perfectly
valid and had well-defined semantics: it effectively disables migration based on
execution time.
Now that the debugfs interface uses an unsigned type, this useful configuration is no
longer accessible from userspace—even though the kernel code still supports it.
This seems like an unintended regression in debugfs exposure. Should we consider changing
the interface to use a signed type so that -1 (and other negative values, if meaningful)
can be written again?
One possible approach would be to introduce a debugfs_create_s32() helper (similar to
existing u32/u64 helpers) and use it for migration_cost_ns.
The following is a partial code snippet, which does not include the implementation of
the debugfs_create_s32() interface:
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index b24f40f05019..379190e2f8a9 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -608,7 +608,7 @@ static __init int sched_init_debug(void)
debugfs_create_u32("latency_warn_once", 0644, debugfs_sched, &sysctl_resched_latency_warn_once);
debugfs_create_file("tunable_scaling", 0644, debugfs_sched, NULL, &sched_scaling_fops);
- debugfs_create_u32("migration_cost_ns", 0644, debugfs_sched, &sysctl_sched_migration_cost);
+ debugfs_create_s32("migration_cost_ns", 0644, debugfs_sched, &sysctl_sched_migration_cost);
debugfs_create_u32("nr_migrate", 0644, debugfs_sched, &sysctl_sched_nr_migrate);
sched_domains_mutex_lock();
I’d appreciate feedback on:
Whether this behavior change was intentional;
If not, whether adding debugfs_create_s32() is an acceptable solution;
Or if there’s a better way to preserve the -1 semantic without breaking debugfs
conventions.
Thanks!