Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to3.6-rc5 on AMD chipsets - bisected

From: david
Date: Thu Sep 27 2012 - 03:27:31 EST


On Wed, 26 Sep 2012, Borislav Petkov wrote:

It always selected target_cpu, but the fact is, that doesn't really
sound very sane. The target cpu is either the previous cpu or the
current cpu, depending on whether they should be balanced or not. But
that still doesn't make any *sense*.

In fact, the whole select_idle_sibling() logic makes no sense
what-so-ever to me. It seems to be total garbage.

For example, it starts with the maximum target scheduling domain, and
works its way in over the scheduling groups within that domain. What
the f*ck is the logic of that kind of crazy thing? It never makes
sense to look at a biggest domain first. If you want to be close to
something, you want to look at the *smallest* domain first. But
because it looks at things in the wrong order, it then needs to have
that inner loop saying "does this group actually cover the cpu I am
interested in?"

Please tell me I am mis-reading this?

First of all, I'm so *not* a scheduler guy so take this with a great
pinch of salt.

The way I understand it is, you either want to share L2 with a process,
because, for example, both working sets fit in the L2 and/or there's
some sharing which saves you moving everything over the L3. This is
where selecting a core on the same L2 is actually a good thing.

Or, they're too big to fit into the L2 and they start kicking each-other
out. Then you want to spread them out to different L2s - i.e., different
HT groups in Intel-speak.

an observation from an outsider here.

if you do overload a L2 cache, then the core will be busy all the time and you will end up migrating a task away from that core.

It seems to me that trying to figure out if you are going to overload the L2 is an impossible task, so just assume that it will all fit, and the worst case is you have one balancing cycle where you can't do as much work and then the normal balancing will kick in and move something anyway.

over the long term, the work lost due to not moving optimally right away is probably much less than the work lost due to trying to figure out the perfect thing to do.

and since the perfect thing to do is going to be both workload and chip specific, trying to model that in your decision making is a lost cause.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/