Re: [PATCH] Binding processes to selected CPUs

avik@altavista.net
Tue, 5 Oct 1999 19:32:14 -0400 (EDT)


> My only point is that I think my SMP scheduler can automagically optimize
> the throughput better than an hardwired binding.

I wanted to check that, so I did some benchmarks to see the effect
of CPU binding, and got some wierd results.

The program I benchmarked was this:

meaningless-benchmark.c:

#define CACHE_SIZE (128*1024) /* Celeron, yuck */
#define REPETITIONS 1500000000

int main()
{
char a[CACHE_SIZE];
unsigned i, p;
for (i = p = 0; i < REPETITIONS; ++i) {
p = (p - 4) % CACHE_SIZE;
++a[p];
}
return 0;
}

The idea is to access all of the cache, keeping it dirty at all times,
to maximize the cpu migration penalty.

In addition, some tests ran a program, eatcpu, in the background in order
to cause scheduler activity.

eatcpu.c:

int main() { while(1); }

I won't explain what it does.

Benchmarking consisted of running two meaningless-benchmark processes,
once on an idle maching, once on a machine running eatcpu, on various
kernels, with cpu binding activated or deactivated.

bench:

#! /bin/sh
echo "Kernel `uname -r`" >> results
(/usr/bin/time bindcpu 1 meaningless-benchmark 2>&1 3>&1) >> results &
(/usr/bin/time bindcpu 2 meaningless-benchmark 2>&1 3>&1) >> results &
wait

(The parameter to bindcpu was 3 for the tests where binding was not
activated).

The bindcpu program selects a set of processors and execs a program bound
to these processors.

bindcpu.c:

#include <sys/prctl.h>
#include <stdio.h>
#include <unistd.h>

int main(int ac, char** av, char** ep)
{
if (ac == 1) {
unsigned long binding = 0;
prctl(PR_GET_CPUBINDING, &binding);
printf("CPU binding: %08lX\n", binding);
} else {
unsigned long binding = atoi(av[1]);
prctl(PR_SET_CPUBINDING, binding);
execve(av[2], av+2, ep);
}
return 0;
}

The results are shown in user seconds for each pair of processes. In
all cases, page faults were the same and system time was less than 0.05s.

Machine: dual celeron @ 300MHz, 128kB L2.

2.2.13pre14; system idle
45.04 77.29
44.16 67.82
37.15 69.28
57.30 66.40
51.41 69.27
2.2.13pre14; system running eatcpu in background
56.08 75.73
53.51 80.23
55.56 58.74
66.13 67.06
43.61 53.48
2.3.19-bindcpu; system idle
60.95 66.64
39.40 57.17
39.36 61.54
39.40 57.26
38.36 63.83
2.3.19-bindcpu; system running eatcpu in background
43.79 50.19
32.70 50.38
61.64 71.61
32.76 52.62
52.29 69.12
2.3.19-bindcpu; system idle; binding enabled
47.24 52.87
51.53 57.47
52.07 53.55
41.95 51.97
54.15 54.01
2.3.19-bindcpu; system running eatcpu in background; binding enabled
46.67 52.16
40.04 51.44
46.00 47.37
42.74 68.40
34.45 54.77
2.3.19-bindcpu; system idle; binding enabled; single process
49.03
60.12
67.96
38.17
38.16

I can't understand any of this. Can anyone explain why there should
be a >100% difference in execution time for meaningless-benchmark.c?
Someone claimed the cpu-migration penalty was around 1ms. If that's
correct, then this indicates that there is a migration every 2ms.

Or, the time measurement is completely unreliable.

Or, I missed something really big.

Has anyone any idea?

----------------------------------------------------------------
Get your free email from AltaVista at http://altavista.iname.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/