Re: [RFC PATCH v2 00/17] Core scheduling v2

From: Ingo Molnar
Date: Fri Apr 26 2019 - 05:46:49 EST

Next message: kernel test robot: "[ext4] 345c0dbf3a: xfstests.ext4.303.fail"
Previous message: Mauro Carvalho Chehab: "Re: [PATCH v2 25/79] docs: convert docs to ReST and rename to *.rst"
In reply to: Mel Gorman: "Re: [RFC PATCH v2 00/17] Core scheduling v2"
Next in thread: Mel Gorman: "Re: [RFC PATCH v2 00/17] Core scheduling v2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:

> > > I can show a comparison with equal levels of parallelisation but with
> > > HT off, it is a completely broken configuration and I do not think a
> > > comparison like that makes any sense.
> >
> > I would still be interested in that comparison, because I'd like
> > to learn whether there's any true *inherent* performance advantage to
> > HyperThreading for that particular workload, for exactly tuned
> > parallelism.
> >
>
> It really isn't a fair comparison. MPI seems to behave very differently
> when a machine is saturated. It's documented as changing its behaviour
> as it tries to avoid the worst consequences of saturation.
>
> Curiously, the results on the 2-socket machine were not as bad as I
> feared when the HT configuration is running with twice the number of
> threads as there are CPUs
>
> Amean bt 771.15 ( 0.00%) 1086.74 * -40.93%*
> Amean cg 445.92 ( 0.00%) 543.41 * -21.86%*
> Amean ep 70.01 ( 0.00%) 96.29 * -37.53%*
> Amean is 16.75 ( 0.00%) 21.19 * -26.51%*
> Amean lu 882.84 ( 0.00%) 595.14 * 32.59%*
> Amean mg 84.10 ( 0.00%) 80.02 * 4.84%*
> Amean sp 1353.88 ( 0.00%) 1384.10 * -2.23%*

Yeah, so what I wanted to suggest is a parallel numeric throughput test
with few inter-process data dependencies, and see whether HT actually
improves total throughput versus the no-HT case.

No over-saturation - but exactly as many threads as logical CPUs.

I.e. with 20 physical cores and 40 logical CPUs the numbers to compare
would be a 'nosmt' benchmark running 20 threads, versus a SMT test
running 40 threads.

I.e. how much does SMT improve total throughput when the workload's
parallelism is tuned to utilize 100% of the available CPUs?

Does this make sense?

Thanks,

Ingo

Next message: kernel test robot: "[ext4] 345c0dbf3a: xfstests.ext4.303.fail"
Previous message: Mauro Carvalho Chehab: "Re: [PATCH v2 25/79] docs: convert docs to ReST and rename to *.rst"
In reply to: Mel Gorman: "Re: [RFC PATCH v2 00/17] Core scheduling v2"
Next in thread: Mel Gorman: "Re: [RFC PATCH v2 00/17] Core scheduling v2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]