Re: zram: per-cpu compression streams

From: Sergey Senozhatsky
Date: Wed Apr 27 2016 - 04:52:43 EST



Hello,

more tests. I did only 8streams vs per-cpu this time. the changes
to the test are:
-- mem-hogger now per-faults pages in parallel with fio
-- mem-hogger alloc size increased from 3GB to 4GB.

the system couldn't survive 4GB/4GB zram(buffer_compress_percentage=11)/mem-hogger
split (OOM), so I executed the 3GB/4GB test (close to system's OOM edge).

-- 4 GB x86_64
-- 3 GB zram lzo

firts, the mm_stat. <num_writes / num_recompressions>

8 streams (base kernel):

3221225472 3221225472 3221225472 0 3221229568 0 0 < 2752460/ 0>
3221225472 3221225472 3221225472 0 3221233664 0 0 < 5504124/ 0>
3221225472 2912157607 2952802304 0 2952826880 0 81 < 8253369/ 0>
3221225472 2893479936 2899120128 0 2899136512 0 147 <11003056/ 0>
3221217280 2886040814 2899103744 0 2899128320 0 26 <13748450/ 0>
3221225472 2880045056 2885693440 0 2885718016 0 180 <16503120/ 0>
3221213184 2877431364 2883756032 0 2883809280 0 132 <19259891/ 0>
3221225472 2873229312 2876096512 0 2876133376 0 16 <22016512/ 0>
3221213184 2870728008 2871693312 0 2871726080 0 24 <24768909/ 0>
2899095552 2899095552 2899095552 0 2899132416 78643 0 <27523600/ 0>

per-cpu:

3221225472 3221225472 3221225472 0 3221229568 0 0 < 2752460/ 8180>
3221225472 3221225472 3221225472 0 3221233664 0 0 < 5504124/ 10523>
3221225472 2912157607 2952802304 0 2952814592 0 117 < 8253369/ 9451>
3221225472 2893479936 2899120128 0 2899136512 0 129 <11003056/ 9395>
3221217280 2886040814 2899103744 0 2899128320 0 51 <13748450/ 10879>
3221225472 2880045056 2885693440 0 2885718016 0 126 <16503120/ 10300>
3221213184 2877431364 2883772416 0 2883801088 0 252 <19259891/ 10509>
3221225472 2873229312 2876100608 0 2876133376 0 14 <22016512/ 11081>
3221213184 2870728008 2871693312 0 2871730176 0 54 <24768909/ 10770>
2899095552 2899095552 2899095552 0 2899136512 78643 0 <27523600/ 10231>


mem-hogger pre-fault times

8 streams (base kernel):

[431] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f3f5d38a010 <+ 6.031550428>
[470] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7fa29d414010 <+ 5.242295692>
[514] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f4a7eac8010 <+ 5.485469454>
[563] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f07da76b010 <+ 5.563647658>
[619] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7ff5efc26010 <+ 5.516866208>
[681] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f8fb896d010 <+ 5.535275748>
[751] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7fb2ac6fa010 <+ 4.594626366>
[825] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f355f9a0010 <+ 5.075849029>
[905] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7feb16715010 <+ 4.696363680>
[991] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f3a1b9f4010 <+ 5.292365453>


per-cpu:

[413] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7fe8058f5010 <+ 5.513944292>
[451] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f65fe753010 <+ 4.742384977>
[494] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7fb99a05c010 <+ 5.394711696>
[542] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f0d61c81010 <+ 5.021011664>
[598] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f9abdeb6010 <+ 5.094722019>
[660] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7fb192ae9010 <+ 4.943961060>
[728] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f7313aeb010 <+ 5.437872456>
[802] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f25ffdeb010 <+ 5.422829590>
[881] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f60daa8e010 <+ 4.806425351>
[970] single-alloc: INFO: Allocated 0x100000000 bytes at address 0x7f384cf04010 <+ 4.982513395>


so, pre-fault time range is somewhat big. for example, from 4.696363680 to 6.031550428 seconds.



fio
8 streams per-cpu
===========================================
#jobs1
READ: 2507.8MB/s 2526.4MB/s
READ: 2043.1MB/s 1970.6MB/s
WRITE: 127100KB/s 139160KB/s
WRITE: 724488KB/s 733440KB/s
READ: 534624KB/s 540967KB/s
WRITE: 534569KB/s 540912KB/s
READ: 471165KB/s 477459KB/s
WRITE: 471233KB/s 477527KB/s
#jobs2
READ: 8041.1MB/s 7866.9MB/s
READ: 6751.7MB/s 6692.9MB/s
WRITE: 268372KB/s 268269KB/s
WRITE: 1197.5MB/s 1331.3MB/s
READ: 997.75MB/s 1057.6MB/s
WRITE: 997.91MB/s 1057.7MB/s
READ: 934518KB/s 992852KB/s
WRITE: 932382KB/s 990582KB/s
#jobs3
READ: 13318MB/s 13454MB/s
READ: 11463MB/s 11491MB/s
WRITE: 449903KB/s 448791KB/s
WRITE: 1582.8MB/s 1782.3MB/s
READ: 1337.5MB/s 1449.1MB/s
WRITE: 1335.6MB/s 1447.2MB/s
READ: 1241.3MB/s 1345.7MB/s
WRITE: 1239.6MB/s 1343.6MB/s
#jobs4
READ: 19948MB/s 20013MB/s
READ: 17732MB/s 17479MB/s
WRITE: 630690KB/s 495078KB/s
WRITE: 1843.2MB/s 2226.9MB/s
READ: 1603.4MB/s 1846.8MB/s
WRITE: 1599.4MB/s 1842.2MB/s
READ: 1547.7MB/s 1740.7MB/s
WRITE: 1549.2MB/s 1742.4MB/s
#jobs5
READ: 18800MB/s 4792.6MB/s
READ: 16659MB/s 16898MB/s
WRITE: 777796KB/s 721363KB/s
WRITE: 1771.9MB/s 2138.7MB/s
READ: 1517.9MB/s 1837.8MB/s
WRITE: 1512.6MB/s 1831.5MB/s
READ: 1501.4MB/s 1784.1MB/s
WRITE: 1500.5MB/s 1783.9MB/s
#jobs6
READ: 20827MB/s 20571MB/s
READ: 19382MB/s 19505MB/s
WRITE: 850618KB/s 776148KB/s
WRITE: 1886.2MB/s 2127.7MB/s
READ: 1685.3MB/s 1864.8MB/s
WRITE: 1682.4MB/s 1860.8MB/s
READ: 1598.3MB/s 1727.9MB/s
WRITE: 1593.5MB/s 1722.6MB/s
#jobs7
READ: 21547MB/s 21000MB/s
READ: 18814MB/s 18715MB/s
WRITE: 1008.5MB/s 991.56MB/s
WRITE: 1922.5MB/s 2232.9MB/s
READ: 1640.3MB/s 1795.2MB/s
WRITE: 1641.3MB/s 1796.4MB/s
READ: 1578.2MB/s 1763.2MB/s
WRITE: 1569.5MB/s 1753.5MB/s
#jobs8
READ: 20277MB/s 20916MB/s
READ: 17952MB/s 18340MB/s
WRITE: 1186.1MB/s 1170.6MB/s
WRITE: 1955.8MB/s 2347.6MB/s
READ: 1686.5MB/s 1936.7MB/s
WRITE: 1688.3MB/s 1938.8MB/s
READ: 1610.3MB/s 1894.7MB/s
WRITE: 1606.6MB/s 1890.4MB/s
#jobs9
READ: 20108MB/s 19361MB/s
READ: 18012MB/s 18177MB/s
WRITE: 1355.1MB/s 1325.7MB/s
WRITE: 1948.5MB/s 2305.6MB/s
READ: 1662.4MB/s 1892.6MB/s
WRITE: 1661.9MB/s 1891.5MB/s
READ: 1605.5MB/s 1812.4MB/s
WRITE: 1604.2MB/s 1810.7MB/s
#jobs10
READ: 20039MB/s 19455MB/s
READ: 18028MB/s 17716MB/s
WRITE: 1465.3MB/s 1486.7MB/s
WRITE: 2007.2MB/s 2317.5MB/s
READ: 1755.3MB/s 2005.9MB/s
WRITE: 1754.3MB/s 2003.1MB/s
READ: 1691.2MB/s 1874.8MB/s
WRITE: 1694.7MB/s 1877.7MB/s


perf stat

8 streams per-cpu
====================================================================================
jobs1
stalled-cycles-frontend 56,052,305,338 ( 55.05%) 58,803,628,418 ( 55.33%)
stalled-cycles-backend 24,355,709,967 ( 23.92%) 25,413,107,301 ( 23.91%)
instructions 96,175,143,640 ( 0.94) 100,364,109,185 ( 0.94)
branches 18,009,998,853 ( 559.201) 18,513,273,860 ( 550.828)
branch-misses 111,500,123 ( 0.62%) 106,011,616 ( 0.57%)
jobs2
stalled-cycles-frontend 126,012,750,354 ( 59.31%) 123,465,054,991 ( 57.62%)
stalled-cycles-backend 61,866,277,568 ( 29.12%) 58,959,838,855 ( 27.52%)
instructions 183,736,567,135 ( 0.86) 193,832,846,843 ( 0.90)
branches 34,879,279,141 ( 506.137) 36,700,776,757 ( 530.136)
branch-misses 175,122,665 ( 0.50%) 165,057,491 ( 0.45%)
jobs3
stalled-cycles-frontend 175,428,933,301 ( 60.40%) 160,530,802,385 ( 58.59%)
stalled-cycles-backend 87,409,032,068 ( 30.10%) 76,093,143,994 ( 27.77%)
instructions 231,949,985,071 ( 0.80) 229,875,149,073 ( 0.84)
branches 44,034,175,160 ( 423.325) 43,578,595,543 ( 443.250)
branch-misses 237,974,300 ( 0.54%) 216,380,848 ( 0.50%)
jobs4
stalled-cycles-frontend 265,519,049,536 ( 64.46%) 221,049,841,649 ( 61.81%)
stalled-cycles-backend 146,538,881,296 ( 35.57%) 113,774,053,039 ( 31.82%)
instructions 298,241,854,695 ( 0.72) 278,000,866,874 ( 0.78)
branches 59,531,800,053 ( 400.919) 55,096,944,109 ( 427.816)
branch-misses 285,108,083 ( 0.48%) 260,972,185 ( 0.47%)
jobs5
stalled-cycles-frontend 290,281,266,141 ( 64.97%) 260,946,337,232 ( 61.99%)
stalled-cycles-backend 161,884,390,707 ( 36.23%) 137,154,776,973 ( 32.58%)
instructions 326,306,594,233 ( 0.73) 334,011,271,525 ( 0.79)
branches 63,904,071,806 ( 398.348) 65,664,365,815 ( 434.393)
branch-misses 293,793,049 ( 0.46%) 279,794,621 ( 0.43%)
jobs6
stalled-cycles-frontend 344,523,942,841 ( 65.53%) 287,955,119,151 ( 61.88%)
stalled-cycles-backend 193,660,445,380 ( 36.84%) 150,866,799,639 ( 32.42%)
instructions 381,794,200,792 ( 0.73) 375,547,185,965 ( 0.81)
branches 74,623,783,129 ( 394.258) 73,649,248,349 ( 441.644)
branch-misses 358,005,680 ( 0.48%) 306,143,187 ( 0.42%)
jobs7
stalled-cycles-frontend 369,290,213,422 ( 64.74%) 319,117,228,349 ( 61.21%)
stalled-cycles-backend 206,236,039,426 ( 36.16%) 168,934,948,019 ( 32.40%)
instructions 427,549,938,405 ( 0.75) 429,752,151,831 ( 0.82)
branches 82,174,130,236 ( 400.051) 83,110,458,913 ( 442.306)
branch-misses 354,517,174 ( 0.43%) 332,430,584 ( 0.40%)
jobs8
stalled-cycles-frontend 409,541,894,683 ( 66.76%) 349,581,824,315 ( 62.51%)
stalled-cycles-backend 229,256,571,129 ( 37.37%) 181,622,273,772 ( 32.47%)
instructions 437,816,833,182 ( 0.71) 450,389,502,564 ( 0.81)
branches 84,525,812,473 ( 382.128) 87,501,121,276 ( 434.210)
branch-misses 372,309,759 ( 0.44%) 349,523,647 ( 0.40%)
jobs9
stalled-cycles-frontend 442,628,204,560 ( 65.90%) 380,475,695,919 ( 61.52%)
stalled-cycles-backend 251,927,332,399 ( 37.51%) 199,303,426,179 ( 32.22%)
instructions 491,437,868,336 ( 0.73) 511,514,729,028 ( 0.83)
branches 93,730,386,271 ( 386.978) 98,645,937,110 ( 442.304)
branch-misses 401,101,757 ( 0.43%) 368,882,924 ( 0.37%)
jobs10
stalled-cycles-frontend 478,576,939,331 ( 67.41%) 408,498,109,428 ( 63.43%)
stalled-cycles-backend 274,043,625,756 ( 38.60%) 219,162,314,972 ( 34.03%)
instructions 495,607,125,031 ( 0.70) 505,149,872,644 ( 0.78)
branches 95,885,616,294 ( 374.483) 99,094,367,930 ( 426.624)
branch-misses 418,267,387 ( 0.44%) 392,516,508 ( 0.40%)


perf reported execution time

8 streams per-cpu
====================================================
seconds elapsed 51.128322985 47.600230868
seconds elapsed 49.309895468 48.291538090
seconds elapsed 47.075673742 46.068406557
seconds elapsed 47.816933840 52.966896478
seconds elapsed 54.345548549 50.918853799
seconds elapsed 58.613938093 58.130571913
seconds elapsed 62.799745992 60.086779664
seconds elapsed 65.664854260 61.414515686
seconds elapsed 71.340920175 67.717224950
seconds elapsed 74.169664807 69.485210016


-ss