Re: [PATCH v4 2/4] zram: implement deduplication in zram
From: Joonsoo Kim
Date: Tue May 02 2017 - 01:31:37 EST
On Thu, Apr 27, 2017 at 04:46:22PM +0900, Sergey Senozhatsky wrote:
> On (04/27/17 15:57), Joonsoo Kim wrote:
> > I tested with your benchmark and found that contention happens
> > since the data page is perfectly the same. All the written data (2GB)
> > is de-duplicated.
> yes, a statically filled buffer to guarantee that
> compression/decompression numbers/impact will be stable.
> otherwise the test results are "apples vs oranges" :)
Yes, but, we can maintain buffer set and using it on the test will
ensure "apples vs apples" test and will be less aftificial test.
Compression algorithm's effect also cannot be measured by a statically
> > I tried to optimize it with read-write lock but I failed since
> > there is another contention, which cannot be fixed simply. That is
> > zsmalloc. We need to map the object and compare the content of the
> > compressed page to check de-duplication. Zsmalloc pins the object
> > by using bit spinlock when mapping. So, parallel readers to the same
> > object contend here.
> > I think that this case is so artificial and, in practice, there
> > would be no case that the same data page is repeatedly and parallel
> > written as like this. So, I'd like to keep current code. How do you
> > think about it, Sergey?
> I agree. thanks for taking a look!
> I see no blockers for the patch set.
> <off topic>
> ok, in general, seems that (correct me if I'm wrong)
> a) the higher the dedup ratio the slower zram _can_ perform.
> because dedup can create parallel access scenarios where they previously
> never existed: different offset writes now can compete for the same dedupped
> zsmalloc object.
However, the higher the dedup ratio doesn't necessarily mean that all
the data is the same. Important thing is the distribution of the data
rather than dedup ratio. And, parallelism is also important.
> and... tricky and probably over exaggerated
> b) the lower the dedup ratio the slower zram _can_ perform.
> think of almost full zram device with dedup ratio of just 3-5%. tree lookups
> are serialized by the hash->lock. a balanced tree gives us slow lookup
> complexity growth, it's still there but can leave with it. at the same time
> low dedup ratio means that we have wasted CPU cycles on checksum calculation
> (potentially for millions of pages if zram device in question is X gigabytes
> in size), this can't go unnoticed.
> it's just I was slightly confused by the performance numbers that you
> have observed. some tests were
> : It shows performance degradation roughly 13% and save 24% memory. Maybe,
> : it is due to overhead of calculating checksum and comparison.
> while others were
> : There is no performance degradation and save 23% memory.
As I said above, dedup ratio itself doesn't say everything. And, they are
quite different tests (kernel build vs file copy) (multi thread vs
single thread) so I cannot say anything by the fact that their saving
ratio is similar. :)
> I understand that you didn't perform direct io, flush, fsync, etc. and
> there is a whole bunch of factors that could have affected your tests,
> e.g. write back, etc. etc. but the numbers are still very unstable.
> may be now we will have a bit better understanding :)
I also hope so.