Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to3.6-rc5 on AMD chipsets - bisected
From: Peter Zijlstra
Date: Mon Sep 24 2012 - 11:31:01 EST
On Mon, 2012-09-24 at 16:00 +0100, Mel Gorman wrote:
> On Fri, Sep 14, 2012 at 02:42:44PM -0700, Linus Torvalds wrote:
> > On Fri, Sep 14, 2012 at 2:27 PM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> > >
> > > as Nikolay says below, we have a regression in 3.6 with pgbench's
> > > benchmark in postgresql.
> > >
> > > I was able to reproduce it on another box here and did a bisection run.
> > > It pointed to the commit below.
> >
> > Ok. I guess we should just revert it. However, before we do that,
> > maybe Mike can make it just use the exact old semantics of
> > select_idle_sibling() in the update_top_cache_domain() logic.
> >
>
> The patch that you being reverted was meant to fix problems with
> commit 4dcfe102 (sched: Avoid SMT siblings in select_idle_sibling() if
> possible). That patch made select_idle_sibling() quite fat and I know it
> is responsible for a 2% regression in a kernel compile benchmark between
> kernel 3.1 and 3.2 on an old AMD Phenom II X4 940. Reverting Mike's patch
> might fix this Postgres regression but it reintroduces the overhead caused
> by commit 4dcfe102 for other cases. I do not have a suggestion on how to
> make this better, I'm just pointing out that the revert has some downsides.
Something like the below removes a number of cpumask operations, which
on big machines can be quite expensive.
No idea if its sufficient, but its a start.
Anyway, does anybody have any clue as to why AMD and Intel machine
behave significantly different here? Does an Intel box with HT disabled
behave similar to AMD? or is it something about the micro-architecture?
---
kernel/sched/fair.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6b800a1..8757097 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2661,17 +2661,29 @@ static int select_idle_sibling(struct task_struct *p, int target)
for_each_lower_domain(sd) {
sg = sd->groups;
do {
- if (!cpumask_intersects(sched_group_cpus(sg),
- tsk_cpus_allowed(p)))
- goto next;
+ int candidate = nr_cpu_ids;
+ /*
+ * In the SMT case the groups are the SMT-siblings,
+ * otherwise they're singleton groups.
+ */
for_each_cpu(i, sched_group_cpus(sg)) {
+ if (!cpumask_test_cpu(i, tsk_cpus_allowed(p)))
+ continue;
+
+ /*
+ * If any of the SMT-siblings are !idle, the
+ * core isn't idle.
+ */
if (!idle_cpu(i))
goto next;
+
+ if (candidate == nr_cpu_ids)
+ candidate = i;
}
- target = cpumask_first_and(sched_group_cpus(sg),
- tsk_cpus_allowed(p));
+ target = candidate;
+
goto done;
next:
sg = sg->next;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/