RE: bug in sched.c:task_hot()
From: Chen, Kenneth W
Date: Tue Oct 05 2004 - 12:42:24 EST
Chen, Kenneth W wrote:
> Current implementation of task_hot() has a performance bug in it
> that it will cause integer underflow.
>
> Variable "now" (typically passed in as rq->timestamp_last_tick)
> and p->timestamp are all defined as unsigned long long. However,
> If former is smaller than the latter, integer under flow occurs
> which make the result of subtraction a huge positive number. Then
> it is compared to sd->cache_hot_time and it will wrongly identify
> a cache hot task as cache cold.
>
> This bug causes large amount of incorrect process migration across
> cpus (at stunning 10,000 per second) and we lost cache affinity very
> quickly and almost took double digit performance regression on a db
> transaction processing workload. Patch to fix the bug. Diff'ed against
> 2.6.9-rc3.
>
Peter Williams wrote on Tuesday, October 05, 2004 12:33 AM
> The interesting question is: How does now get to be less than timestamp?
> This probably means that timestamp_last_tick is not a good way of
> getting a value for "now". By the way, neither is sched_clock() when
> measuring small time differences as it is not monotonic (something that
> I had to allow for in my scheduling code). I applied no such safeguards
> to the timing used by the load balancing code as I assumed that it
> already worked.
The reason now gets smaller than timestamp was because of random thing
that activate_task() do to p->timestamp. Here are the sequence of events:
On timer tick, scheduler_tick() updates rq->timestamp_last_tick,
1 us later, process A wakes up, entering into run queue. activate_task()
updates p->timestamp. At this time p->timestamp is larger than rq->
timestamp_last_tick. Then another cpu goes into idle, tries load balancing,
It wrongly finds process A not cache hot (because of integer underflow),
moved it. Then application runs on a cache cold CPU and suffer performance.
- Ken
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/