The following change greatly reduced the p99lat of Redis service
from 150ms to 0.9ms, at exactly the same throughput (QPS).
@@ -5763,6 +5787,9 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
s64 this_eff_load, prev_eff_load;
unsigned long task_load;
+ if (is_short_task(p))
+ return nr_cpumask_bits;
+
this_eff_load = cpu_load(cpu_rq(this_cpu));
if (sync) {
I know that 'short' tasks are not necessarily 'small' tasks, e.g.
sleeping duration is small or have large weights, but this works
really well for this case. This is partly because delivering data
is memory bandwidth intensive hence prefer cache hot cpus. And I
think this is also applicable to the general purposes: do NOT let
the short running tasks suffering from cache misses caused by
migration.
Redis is a bit special. It runs quick and really sensitive on schedule latency. The purpose of this 'short task' feature from Yu is to mitigate the migration and tend to place the waking task on local cpu, this is somehow on the opposite side of workload such as Redis. The changes you did remind me of the latency-prio stuff. Maybe we can do something base on both the 'short task' and 'latency-prio' to make your changes more general. thoughts?