Re: Odd performance results

From: Paul E. McKenney
Date: Tue Jul 12 2016 - 14:26:26 EST


On Tue, Jul 12, 2016 at 10:49:58AM -0700, H. Peter Anvin wrote:
> On 07/12/16 08:05, Paul E. McKenney wrote:
> > On Tue, Jul 12, 2016 at 04:55:51PM +0200, Peter Zijlstra wrote:
> >> On Sun, Jul 10, 2016 at 07:43:27AM -0700, Paul E. McKenney wrote:
> >>> On Sun, Jul 10, 2016 at 07:17:19AM +0200, Peter Zijlstra wrote:
> >>>>
> >>>>
> >>>> On 10 July 2016 06:26:39 CEST, "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >>>>> Hello!
> >>>>>
> >>>>> So I ran a quick benchmark which showed stair-step results. I
> >>>>> immediately
> >>>>> thought "Ah, this is due to CPU 0 and 1, 2 and 3, 4 and 5, and 6 and 7
> >>>>> being threads in a core." Then I thought "Wait, this is an x86!"
> >>>>> Then I dumped out cpu*/topology/thread_siblings_list, getting the
> >>>>> following:
> >>>>>
> >>>>> cpu0/topology/thread_siblings_list: 0-1
> >>>>> cpu1/topology/thread_siblings_list: 0-1
> >>>>> cpu2/topology/thread_siblings_list: 2-3
> >>>>> cpu3/topology/thread_siblings_list: 2-3
> >>>>> cpu4/topology/thread_siblings_list: 4-5
> >>>>> cpu5/topology/thread_siblings_list: 4-5
> >>>>> cpu6/topology/thread_siblings_list: 6-7
> >>>>> cpu7/topology/thread_siblings_list: 6-7
> >>>>
> >>>>
> >>>> I'm guessing this is an AMD bulldozer like machine?
> >>>
> >>> /proc/cpuinfo thinks otherwise:
> >>>
> >>> processor : 0
> >>> vendor_id : GenuineIntel
> >>> cpu family : 6
> >>> model : 60
> >>> model name : Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz
> >>
> >> Weird, I've never seen an Intel box do that before... hpa, any idea? or
> >> is this just one weird BIOS.
> >
> > ;-)
> >
> > It is a Lenovo W541 laptop, for whatever that might be worth. Roughly
> > on year old.
>
> Well, the obvious thing here is that CPUs 0-1, 2-3, 4-5, and 6-7 *are*
> indeed threads in a core... Intel x86 products have supported
> multithreading since the Pentium 4. So the "wait, this is an x86!" bit
> is strange to me.
>
> The CPU in question (and /proc/cpuinfo should show this) has four cores
> with a total of eight threads. The "siblings" and "cpu cores" fields in
> /proc/cpuinfo should show the same thing. So I am utterly confused
> about what is unexpected here?

My prior experience with Intel x86 systems led me to expect that the
hardware-thread pairs would instead be 0 and 4, 1 and 5, 2 and 6, and 3
and 7. This would result in a graph with a two-segment line, having
higher slope for the lower-numbered CPUs and a lower slope for the
higher-numbered CPUs, and I have in fact seen this behavior on older
Intel x86 systems. See for example slides 64-67 of:

http://www.rdrop.com/users/paulmck/scalability/paper/Updates.2016.06.05a.TUDresden.pdf

But don't get me wrong, I do very much prefer the CPU-numbering approach
that my laptop uses, where the hardware threads in a given core have
consecutive numbers.

> Also, you mentioned absolutely nothing about what kind of benchmark it
> was, or what the "stairstepping" results imply, so it doesn't really
> make it any easier...

The benchmark was a POSIX-threads multithreaded benchmark with each
thread repeatedly searching a small linked list, which should fit into
the nearest-to-CPU cache. The "stairstepping" results suggest to me
that a no-cache-miss pointer-following workload allows a single hardware
thread to consume most of a given core's relevant hardware resources,
at least on this particular chip. Which is fine -- this sort of thing
always has been workload-specific.

If you want to see an example plot, take a look at:

CodeSamples/defer/perf-rcu-qsbr.eps

within:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git

Thanx, Paul