Re: [Lse-tech] Minature NUMA scheduler

From: Michael Hohnbaum (hohnbaum@us.ibm.com)
Date: Fri Jan 10 2003 - 00:36:03 EST


On Thu, 2003-01-09 at 15:54, Martin J. Bligh wrote:
> I tried a small experiment today - did a simple restriction of
> the O(1) scheduler to only balance inside a node. Coupled with
> the small initial load balancing patch floating around, this
> covers 95% of cases, is a trivial change (3 lines), performs
> just as well as Erich's patch on a kernel compile, and actually
> better on schedbench.
>
> This is NOT meant to be a replacement for the code Erich wrote,
> it's meant to be a simple way to get integration and acceptance.
> Code that just forks and never execs will stay on one node - but
> we can take the code Erich wrote, and put it in seperate rebalancer
> that fires much less often to do a cross-node rebalance.

I tried this on my 4 node NUMAQ (16 procs, 16GB memory) and got
similar results. Also, added in the cputime_stats patch and am
attaching the matrix results from the 32 process run. Basically,
all runs show that the initial load balancer is able to place the
tasks evenly across the nodes, and the better overall times show
that not migrating to keep the nodes balanced over time results
in better performance - at least on these boxes.

Obviously, there can be situations where load balancing across
nodes is necessary, but these results point to less load balancing
being better, at least on these machines. It will be interesting
to repeat this on boxes with other interconnects.

$ reportbench hacksched54 sched54 stock54
Kernbench:
                        Elapsed User System CPU
         hacksched54 29.406s 282.18s 81.716s 1236.8%
             sched54 29.112s 283.888s 82.84s 1259.4%
             stock54 31.348s 303.134s 87.824s 1247.2%

Schedbench 4:
                        AvgUser Elapsed TotalUser TotalSys
         hacksched54 21.94 31.93 87.81 0.53
             sched54 22.03 34.90 88.15 0.75
             stock54 49.35 57.53 197.45 0.86

Schedbench 8:
                        AvgUser Elapsed TotalUser TotalSys
         hacksched54 28.23 31.62 225.87 1.11
             sched54 27.95 37.12 223.66 1.50
             stock54 43.14 62.97 345.18 2.12

Schedbench 16:
                        AvgUser Elapsed TotalUser TotalSys
         hacksched54 49.29 71.31 788.83 2.88
             sched54 55.37 69.58 886.10 3.79
             stock54 66.00 81.25 1056.25 7.12

Schedbench 32:
                        AvgUser Elapsed TotalUser TotalSys
         hacksched54 56.41 117.98 1805.35 5.90
             sched54 57.93 132.11 1854.01 10.74
             stock54 77.81 173.26 2490.31 12.37

Schedbench 64:
                        AvgUser Elapsed TotalUser TotalSys
         hacksched54 56.62 231.93 3624.42 13.32
             sched54 72.91 308.87 4667.03 21.06
             stock54 86.68 368.55 5548.57 25.73

hacksched54 = 2.5.54 + Martin's tiny NUMA patch +
              03-cputimes_stat-2.5.53.patch +
              02-numa-sched-ilb-2.5.53-21.patch
sched54 = 2.5.54 + 01-numa-sched-core-2.5.53-24.patch +
          02-ilb-2.5.53-21.patch02 +
          03-cputimes_stat-2.5.53.patch
stock54 - 2.5.54 + 03-cputimes_stat-2.5.53.patch

Detailed results from running numa_test 32:

Executing 32 times: ./randupdt 1000000
Running 'hackbench 20' in the background: Time: 4.557
Job node00 node01 node02 node03 | iSched MSched | UserTime(s)
  1 100.0 0.0 0.0 0.0 | 0 0 | 57.12
  2 0.0 100.0 0.0 0.0 | 1 1 | 55.89
  3 100.0 0.0 0.0 0.0 | 0 0 | 55.39
  4 0.0 0.0 100.0 0.0 | 2 2 | 56.67
  5 0.0 0.0 0.0 100.0 | 3 3 | 57.08
  6 0.0 100.0 0.0 0.0 | 1 1 | 56.31
  7 100.0 0.0 0.0 0.0 | 0 0 | 57.11
  8 0.0 0.0 0.0 100.0 | 3 3 | 56.66
  9 0.0 100.0 0.0 0.0 | 1 1 | 55.87
 10 0.0 0.0 100.0 0.0 | 2 2 | 55.83
 11 0.0 0.0 0.0 100.0 | 3 3 | 56.01
 12 0.0 100.0 0.0 0.0 | 1 1 | 56.56
 13 0.0 0.0 100.0 0.0 | 2 2 | 56.31
 14 0.0 0.0 0.0 100.0 | 3 3 | 56.40
 15 100.0 0.0 0.0 0.0 | 0 0 | 55.82
 16 0.0 100.0 0.0 0.0 | 1 1 | 57.47
 17 0.0 0.0 100.0 0.0 | 2 2 | 56.76
 18 0.0 0.0 0.0 100.0 | 3 3 | 57.10
 19 0.0 100.0 0.0 0.0 | 1 1 | 57.26
 20 0.0 0.0 100.0 0.0 | 2 2 | 56.48
 21 0.0 0.0 0.0 100.0 | 3 3 | 56.65
 22 0.0 100.0 0.0 0.0 | 1 1 | 55.81
 23 0.0 0.0 100.0 0.0 | 2 2 | 55.77
 24 0.0 0.0 0.0 100.0 | 3 3 | 56.91
 25 0.0 100.0 0.0 0.0 | 1 1 | 56.86
 26 0.0 0.0 100.0 0.0 | 2 2 | 56.62
 27 0.0 0.0 0.0 100.0 | 3 3 | 57.16
 28 0.0 0.0 100.0 0.0 | 2 2 | 56.36
 29 100.0 0.0 0.0 0.0 | 0 0 | 55.72
 30 100.0 0.0 0.0 0.0 | 0 0 | 56.00
 31 100.0 0.0 0.0 0.0 | 0 0 | 55.48
 32 100.0 0.0 0.0 0.0 | 0 0 | 55.59
AverageUserTime 56.41 seconds
ElapsedTime 117.98
TotalUserTime 1805.35
TotalSysTime 5.90

Runs with 4, 8, 16, and 64 processes all showed the same even
distribution across all nodes and 100% time on node for each
process.

-- 
Michael Hohnbaum            503-578-5486
hohnbaum@us.ibm.com         T/L 775-5486

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jan 15 2003 - 22:00:31 EST