[PATCH 0/6 v2] PM / Hibernate: Memory bitmap scalability improvements

From: Joerg Roedel
Date: Mon Jul 21 2014 - 06:28:30 EST


Changes v1->v2:

* Rebased to v3.16-rc6
* Fixed the style issues in Patch 1 mentioned by Rafael

Hi,

here is the revised patch set to improve the scalability of
the memory bitmap implementation used for hibernation. The
current implementation does not scale well to machines with
several TB of memory. A resume on those machines may cause
soft lockups to be reported.

These patches improve the data structure by adding a radix
tree to the linked list structure to improve random access
performance from O(n) to O(log_b(n)), where b depends on the
architecture (b=512 on amd64, 1024 in i386).

A test on a 12TB machine showed an improvement in resume
time from 76s with the old implementation to 2.4s with the
radix tree and the improved swsusp_free function. See below
for details of this test.

Patches 1-3 that add the radix tree while keeping the
existing memory bitmap implementation in place and add code
to compare the results between both implementations. This
was used during development to make sure both data
structures return the same results.

Patch 4 re-implements the swsusp_free() function to not
iterate over all pfns but only over the bits set in the
bitmaps. This showed to scale better on large memory
machines.

Patch 5 removes the old memory bitmap implementation now
that the radix tree is in place and working correctly.

The last patch adds touching the soft lockup watchdog in
rtree_next_node. This is necessary because the worst case
performance (all bits set in the forbidden_pages_map and
free_pages_map) is the same as with the old implementation
and may still cause soft lockups. Patch 6 avoids this.

The code was tested in 32 and 64 bit x86 and showed no
issues there.

Below is an example test that shows the performance
improvement on a 12TB machine. First the test with the old
memory bitmap:

# time perf record /usr/sbin/resume $sdev
resume: libgcrypt version: 1.5.0
[ perf record: Woken up 12 times to write data ]
[ perf record: Captured and wrote 2.882 MB perf.data (~125898 samples) ]

real 1m16.043s
user 0m0.016s
sys 0m0.312s
# perf report --stdio |head -50
# Events: 75K cycles
#
# Overhead Command Shared Object
Symbol
# ........ ....... ....................
........................................
#
56.16% resume [kernel.kallsyms] [k] memory_bm_test_bit
19.35% resume [kernel.kallsyms] [k] swsusp_free
14.90% resume [kernel.kallsyms] [k] memory_bm_find_bit
7.28% resume [kernel.kallsyms] [k] swsusp_page_is_forbidden

And here is the same test on the same machine with these
patches applied:

# time perf record /usr/sbin/resume $sdev
resume: libgcrypt version: 1.5.0
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.039 MB perf.data (~1716 samples) ]

real 0m2.376s
user 0m0.020s
sys 0m0.408s

# perf report --stdio |head -50
# Events: 762 cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. .........................
#
34.78% resume [kernel.kallsyms] [k] find_next_bit
27.03% resume [kernel.kallsyms] [k] clear_page_c_e
9.70% resume [kernel.kallsyms] [k] mark_nosave_pages
3.92% resume [kernel.kallsyms] [k] alloc_rtree_node
2.38% resume [kernel.kallsyms] [k] get_image_page

As can be seen on these results these patches improve the
scalability significantly. Please review, any comments
appreciated.

Thanks,

Joerg

Joerg Roedel (6):
PM / Hibernate: Create a Radix-Tree to store memory bitmap
PM / Hibernate: Add memory_rtree_find_bit function
PM / Hibernate: Implement position keeping in radix tree
PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()
PM / Hibernate: Remove the old memory-bitmap implementation
PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node

kernel/power/snapshot.c | 494 +++++++++++++++++++++++++++++++++++-------------
1 file changed, 367 insertions(+), 127 deletions(-)

--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/