Re: [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory

From: Gregory Price
Date: Fri Apr 05 2024 - 12:56:36 EST


On Fri, Apr 05, 2024 at 03:08:49PM +0900, Honggyu Kim wrote:
> There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> posted at [1].
>
> 1. YCSB zipfian distribution read only workload
> memory pressure with cold memory on node0 with 512GB of local DRAM.
> =============+================================================+=========
> | cold memory occupied by mmap and memset |
> | 0G 440G 450G 460G 470G 480G 490G 500G |
> =============+================================================+=========
> Execution time normalized to DRAM-only values | GEOMEAN
> -------------+------------------------------------------------+---------
> DRAM-only | 1.00 - - - - - - - | 1.00
> CXL-only | 1.22 - - - - - - - | 1.22
> default | - 1.12 1.13 1.14 1.16 1.19 1.21 1.21 | 1.17
> DAMON tiered | - 1.04 1.03 1.04 1.06 1.05 1.05 1.05 | 1.05
> =============+================================================+=========
> CXL usage of redis-server in GB | AVERAGE
> -------------+------------------------------------------------+---------
> DRAM-only | 0.0 - - - - - - - | 0.0
> CXL-only | 52.6 - - - - - - - | 52.6
> default | - 20.4 27.0 33.1 39.5 45.6 50.5 50.3 | 38.1
> DAMON tiered | - 0.1 0.3 0.8 0.6 0.7 1.3 0.9 | 0.7
> =============+================================================+=========
>
> Each test result is based on the exeuction environment as follows.
>
> DRAM-only : redis-server uses only local DRAM memory.
> CXL-only : redis-server uses only CXL memory.
> default : default memory policy(MPOL_DEFAULT).
> numa balancing disabled.
> DAMON tiered: DAMON enabled with DAMOS_MIGRATE_COLD for DRAM nodes and
> DAMOS_MIGRATE_HOT for CXL nodes.
>
> The above result shows the "default" execution time goes up as the size
> of cold memory is increased from 440G to 500G because the more cold
> memory used, the more CXL memory is used for the target redis workload
> and this makes the execution time increase.
>
> However, "DAMON tiered" result shows less slowdown because the
> DAMOS_MIGRATE_COLD action at DRAM node proactively demotes pre-allocated
> cold memory to CXL node and this free space at DRAM increases more
> chance to allocate hot or warm pages of redis-server to fast DRAM node.
> Moreover, DAMOS_MIGRATE_HOT action at CXL node also promotes hot pages
> of redis-server to DRAM node actively.
>
> As a result, it makes more memory of redis-server stay in DRAM node
> compared to "default" memory policy and this makes the performance
> improvement.
>
> The following result of latest distribution workload shows similar data.
>
> 2. YCSB latest distribution read only workload
> memory pressure with cold memory on node0 with 512GB of local DRAM.
> =============+================================================+=========
> | cold memory occupied by mmap and memset |
> | 0G 440G 450G 460G 470G 480G 490G 500G |
> =============+================================================+=========
> Execution time normalized to DRAM-only values | GEOMEAN
> -------------+------------------------------------------------+---------
> DRAM-only | 1.00 - - - - - - - | 1.00
> CXL-only | 1.18 - - - - - - - | 1.18
> default | - 1.18 1.19 1.18 1.18 1.17 1.19 1.18 | 1.18
> DAMON tiered | - 1.04 1.04 1.04 1.05 1.04 1.05 1.05 | 1.04
> =============+================================================+=========
> CXL usage of redis-server in GB | AVERAGE
> -------------+------------------------------------------------+---------
> DRAM-only | 0.0 - - - - - - - | 0.0
> CXL-only | 52.6 - - - - - - - | 52.6
> default | - 20.5 27.1 33.2 39.5 45.5 50.4 50.5 | 38.1
> DAMON tiered | - 0.2 0.4 0.7 1.6 1.2 1.1 3.4 | 1.2
> =============+================================================+=========
>
> In summary of both results, our evaluation shows that "DAMON tiered"
> memory management reduces the performance slowdown compared to the
> "default" memory policy from 17~18% to 4~5% when the system runs with
> high memory pressure on its fast tier DRAM nodes.
>
> Having these DAMOS_MIGRATE_HOT and DAMOS_MIGRATE_COLD actions can make
> tiered memory systems run more efficiently under high memory pressures.
>

Hi,

It's hard to determine from your results whether the performance
mitigation is being caused primarily by MIGRATE_COLD freeing up space
for new allocations, or from some combination of HOT/COLD actions
occurring during execution but after the database has already been
warmed up.

Do you have test results which enable only DAMOS_MIGRATE_COLD actions
but not DAMOS_MIGRATE_HOT actions? (and vice versa)

The question I have is exactly how often is MIGRATE_HOT actually being
utilized, and how much data is being moved. Testing MIGRATE_COLD only
would at least give a rough approximation of that.


Additionally, do you have any data on workloads that exceed the capacity
of the DRAM tier? Here you say you have 512GB of local DRAM, but only
test a workload that caps out at 500G. Have you run a test of, say,
550GB to see the effect of DAMON HOT/COLD migration actions when DRAM
capacity is exceeded?

Can you also provide the DRAM-only results for each test? Presumably,
as workload size increases from 440G to 500G, the system probably starts
using some amount of swap/zswap/whatever. It would be good to know how
this system compares to swap small amounts of overflow.

~Gregory