Mary Edie Meredith <maryedie@xxxxxxxx> wrote:
Performance in DBT-3 (using PostgreSQL) has vastly
improved for _both in the "power" portion (single process/query) and in the "throughput" portion of the test (when the test is running multiple processes) on our 4-way(4GB) and 8-way(8GB) STP systems as compared 2.6.5 kernel results.
Using the default DBT-3 options (ie using LVM, ext2, PostgreSQL version 7.4.1)
Note: Bigger numbers are better.
Kernel....Runid..CPUs.Power..%incP.Thruput %incT 2.6.5 291308 4 97.08 base 120.46 base 2.6.6-rc1 291876 4 146.11 50.5% 222.94 85.1%
Kernel....Runid..CPUs.Power..%incP..Thruput %incT
2.6.5 291346 8 101.08 base 138.95 base
2.6.6-rc1 291915 8 151.69 50.1% 273.69 97.0%
So the improvement is between 50% and 97%!
How odd.
-
Profile 2.6.5 8way throughput phase:
http://khack.osdl.org/stp/291346/profile/after_throughput_test_1-tick.sort
Profile 2.6.6-r1 8way throughput phase:
http://khack.osdl.org/stp/291915/profile/after_throughput_test_1-tick.sort
Odder. do_anonymous_page() is doing 10x more work in 2.6.6-rc1. And the
CPU scheduler cost has fallen a lot.
Frankly, I can't think of anything in 2.6.6-rc1 which would cause either of
these things!
What I notice is that radix_tree_lookup is in the top 20 in the 2.6.5 profile, but not in 2.6.6-rc1. Could theradix tree changes be responsible for this?
I would certainly expect 2x or even higher throughput increases from either
the writeback changes or the ext2&ext3 fsync changes.
DBT-3 is a read mostly DSS workload and the throughput phase is where we run multiple query streams (as many as we have CPUs). In this workload, the database is stored on a file system, but it is small relative to the amount of memory (4GB and 8GB). It almost completely caches in page cache early on. So there is some physical IO in the first few minutes, but very little to none in the remainder.
But you're not doing a significant amount of writing during the test, so
scrub that theory.
Do you have full reports anywhere? I'd be interested in seeing a vmstat
trace from the entire run, both 2.6.5 and 2.6.6-rc1.