Re: [PATCH 3/4] zram: implement deduplication in zram

From: Minchan Kim
Date: Thu Mar 23 2017 - 20:54:02 EST


On Thu, Mar 23, 2017 at 12:04:23PM +0900, Joonsoo Kim wrote:
> On Wed, Mar 22, 2017 at 08:41:21AM +0900, Minchan Kim wrote:
> > Hi Joonsoo,
> >
> > On Thu, Mar 16, 2017 at 11:46:37AM +0900, js1304@xxxxxxxxx wrote:
> > > From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
> > >
> > > This patch implements deduplication feature in zram. The purpose
> > > of this work is naturally to save amount of memory usage by zram.
> > >
> > > Android is one of the biggest users to use zram as swap and it's
> > > really important to save amount of memory usage. There is a paper
> > > that reports that duplication ratio of Android's memory content is
> > > rather high [1]. And, there is a similar work on zswap that also
> > > reports that experiments has shown that around 10-15% of pages
> > > stored in zswp are duplicates and deduplicate them provides some
> > > benefits [2].
> > >
> > > Also, there is a different kind of workload that uses zram as blockdev
> > > and store build outputs into it to reduce wear-out problem of real
> > > blockdev. In this workload, deduplication hit is very high
> > > although I don't know exact reason about it.
> >
> > Hmm, Isn't it due to binary composed by obj files so that many of
> > part between binary and object are sharable?
> >
> > I'd like to clear it out because dedup was not useful for swap workload
> > for the testing in android unlike papers you mentioned.
> > Of course, it depends on the workload so someone might find it's
> > huge useful for his swap workload. However, I want to focus clear
> > winning case scenario rather than "might be better".
> >
> > That's why I want to know clear reason the saving. If my assumption
> > is right(i.e., object file vs. binary file), it would be enough
> > justfication to merge this feature because user can turn on the feature
> > with reasonable expectation.
>
> Okay. I checked the reason of benefit on the kernel build now. There are
> some cases that deduplication happens.
>
> 1) *.cmd
> Build command is usually similar in one directory. So, content of
> these file are very similar. Please check fs/ext4/.namei.o.cmd and
> fs/ext4/.inode.o.cmd. In my system, more than 789 lines are the same
> in 944 and 939 line of the file, respectively.
>
> 2) object file
> built-in.o and temporal object file have the similar content. More
> than 50% of fs/ext4/ext4.o is the same with fs/ext4/built-in.o. I
> think that we can optimize in this case. ext4.o is temporal object
> file and it isn't necessarily needed. We can change following
> fs/ext4/Makefile to optimized one.
>
> orig>
> obj-$(CONFIG_EXT4_FS) += ext4.o
> ext4-y := XXX YYY ZZZ
>
> optimized>
> obj-$(CONFIG_EXT4_FS) += XXX YYY ZZZ
>
> 3) vmlinux
> .tmp_vmlinux1 and .tmp_vmlinux2 and arch/x86/boot/compressed/vmlinux.bin
> have the similar content.
>
> I did similar checking for Android and it also has similar case.
> Some of object files (.class and .so) are similar with another object
> files. (./host/linux-x86/lib/libartd.so and
> ./host/linux-x86/lib/libartd-compiler.so)

Great.

Please include your analysis in the patch's description.
With that, we can recommend to turn on dedup feature to users who want to use
zram to build output directory. Please guide it in Kconfig.
One thing I forgot when the review is "add document about use_dedup on
zram.txt"

>
> >
> > >
> > > Anyway, if we can detect duplicated content and avoid to store duplicated
> > > content at different memory space, we can save memory. This patch
> > > tries to do that.
> > >
> > > Implementation is almost simple and intuitive but I should note
> > > one thing about implementation detail.
> > >
> > > To check duplication, this patch uses checksum of the page and
> > > collision of this checksum could be possible. There would be
> > > many choices to handle this situation but this patch chooses
> > > to allow entry with duplicated checksum to be added to the hash,
> > > but, not to compare all entries with duplicated checksum
> > > when checking duplication. I guess that checksum collision is quite
> >
> > Hmm, if there are many duplicated checksum, what a specific checksum
> > is compared among them?
>
> I implemented it to check just first one.
>
> > > rare event and we don't need to pay any attention to such a case.
> >
> > If it's rare event, why can't we iterate all of entries?
>
> It's possible. If you prefer it, I can do it.

Yes, please do. I want to give expected behavior to the user unless it
gives big overhead/make the code mess.

>
> > > Therefore, I decided the most simplest way to implement the feature.
> > > If there is a different opinion, I can accept and go that way.
> > >
> > > Following is the result of this patch.
> > >
> > > Test result #1 (Swap):
> > > Android Marshmallow, emulator, x86_64, Backporting to kernel v3.18
> > >
> > > orig_data_size: 145297408
> > > compr_data_size: 32408125
> > > mem_used_total: 32276480
> > > dup_data_size: 3188134
> > > meta_data_size: 1444272
> > >
> > > Last two metrics added to mm_stat are related to this work.
> > > First one, dup_data_size, is amount of saved memory by avoiding
> > > to store duplicated page. Later one, meta_data_size, is the amount of
> > > data structure to support deduplication. If dup > meta, we can judge
> > > that the patch improves memory usage.
> > >
> > > In Adnroid, we can save 5% of memory usage by this work.
> > >
> > > Test result #2 (Blockdev):
> > > build the kernel and store output to ext4 FS on zram
> > >
> > > <no-dedup>
> > > Elapsed time: 249 s
> > > mm_stat: 430845952 191014886 196898816 0 196898816 28320 0 0 0
> > >
> > > <dedup>
> > > Elapsed time: 250 s
> > > mm_stat: 430505984 190971334 148365312 0 148365312 28404 0 47287038 3945792
> > >
> > > There is no performance degradation and save 23% memory.
> > >
> > > Test result #3 (Blockdev):
> > > copy android build output dir(out/host) to ext4 FS on zram
> > >
> > > <no-dedup>
> > > Elapsed time: out/host: 88 s
> > > mm_stat: 8834420736 3658184579 3834208256 0 3834208256 32889 0 0 0
> > >
> > > <dedup>
> > > Elapsed time: out/host: 100 s
> > > mm_stat: 8832929792 3657329322 2832015360 0 2832015360 32609 0 952568877 80880336
> > >
> > > It shows performance degradation roughly 13% and save 24% memory. Maybe,
> > > it is due to overhead of calculating checksum and comparison.
> > >
> > > Test result #4 (Blockdev):
> > > copy android build output dir(out/target/common) to ext4 FS on zram
> > >
> > > <no-dedup>
> > > Elapsed time: out/host: 203 s
> > > mm_stat: 4041678848 2310355010 2346577920 0 2346582016 500 4 0 0
> > >
> > > <dedup>
> > > Elapsed time: out/host: 201 s
> > > mm_stat: 4041666560 2310488276 1338150912 0 1338150912 476 0 989088794 24564336
> > >
> > > Memory is saved by 42% and performance is the same. Even if there is overhead
> > > of calculating checksum and comparison, large hit ratio compensate it since
> > > hit leads to less compression attempt.
> >
> > Cool! It would help a lot for users have used zram to build output directory.
> >
> > >
> > > Anyway, benefit seems to be largely dependent on the workload so
> > > following patch will make this feature optional. However, this feature
> > > can help some usecases so is deserved to be merged.
> > >
> > > [1]: MemScope: Analyzing Memory Duplication on Android Systems,
> > > dl.acm.org/citation.cfm?id=2797023
> > > [2]: zswap: Optimize compressed pool memory utilization,
> > > lkml.kernel.org/r/1341407574.7551.1471584870761.JavaMail.weblogic@epwas3p2
> >
> > Overall, it looks good. Thanks!
> > However, I want to change function naming/structuring into my preference style
> > to maintain because it's non-trival feature. So, please look below.
>
> Okay. I understand all you commented. I will fix them and I won't reply to each one.

Thanks for the nice work!