Re: bug in sched.c:task_hot()

From: Nick Piggin
Date: Tue Oct 05 2004 - 02:46:20 EST


Peter Williams wrote:
Chen, Kenneth W wrote:

Current implementation of task_hot() has a performance bug in it
that it will cause integer underflow.

Variable "now" (typically passed in as rq->timestamp_last_tick)
and p->timestamp are all defined as unsigned long long. However,
If former is smaller than the latter, integer under flow occurs
which make the result of subtraction a huge positive number. Then
it is compared to sd->cache_hot_time and it will wrongly identify
a cache hot task as cache cold.

This bug causes large amount of incorrect process migration across
cpus (at stunning 10,000 per second) and we lost cache affinity very
quickly and almost took double digit performance regression on a db
transaction processing workload. Patch to fix the bug. Diff'ed against
2.6.9-rc3.

Signed-off-by: Ken Chen <kenneth.w.chen@xxxxxxxxx>


--- linux-2.6.9-rc3/kernel/sched.c.orig 2004-10-04 19:11:21.000000000 -0700
+++ linux-2.6.9-rc3/kernel/sched.c 2004-10-04 19:19:27.000000000 -0700
@@ -180,7 +180,8 @@ static unsigned int task_timeslice(task_
else
return SCALE_PRIO(DEF_TIMESLICE, p->static_prio);
}
-#define task_hot(p, now, sd) ((now) - (p)->timestamp < (sd)->cache_hot_time)
+#define task_hot(p, now, sd) ((long long) ((now) - (p)->timestamp) \
+ < (long long) (sd)->cache_hot_time)

enum idle_type
{


The interesting question is: How does now get to be less than timestamp? This probably means that timestamp_last_tick is not a good way of getting a value for "now".

It is the best we can do.

By the way, neither is sched_clock() when measuring small time differences as it is not monotonic (something that I had to allow for in my scheduling code).

I'm pretty sure it is monotonic, actually. I know some CPUs can execute
rdtsc speculatively, but I don't think it would ever be sane to execute
two rdtsc's in the wrong order.

I applied no such safeguards to the timing used by the load balancing code as I assumed that it already worked.

It should (modulo this bug).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/