Re: [PATCH v4 2/4] zram: implement deduplication in zram

From: Joonsoo Kim
Date: Thu Apr 27 2017 - 02:58:01 EST


On Wed, Apr 26, 2017 at 03:21:04PM +0900, Sergey Senozhatsky wrote:
> On (04/26/17 15:04), Joonsoo Kim wrote:
> > On Wed, Apr 26, 2017 at 01:02:43PM +0900, Sergey Senozhatsky wrote:
> > > On (04/26/17 09:52), js1304@xxxxxxxxx wrote:
> > > [..]
> > > > <no-dedup>
> > > > Elapsed time: out/host: 88 s
> > > > mm_stat: 8834420736 3658184579 3834208256 0 3834208256 32889 0 0 0
> > > >
> > > > <dedup>
> > > > Elapsed time: out/host: 100 s
> > > > mm_stat: 8832929792 3657329322 2832015360 0 2832015360 32609 0 952568877 80880336
> > > >
> > > > It shows performance degradation roughly 13% and save 24% memory. Maybe,
> > > > it is due to overhead of calculating checksum and comparison.
> > >
> > > I like the patch set, and it makes sense. the benefit is, obviously,
> > > case-by-case. on my system I've managed to save just 60MB on a 2.7G
> > > data set, which is far less than I was hoping to save :)
> > >
> > >
> > > I usually do DIRECT IO fio performance test. JFYI, the results
> > > were as follows:
> >
> > Could you share your fio test setting? I will try to re-generate the
> > result and analyze it.
>
> sure.
>
> I think I used this one: https://github.com/sergey-senozhatsky/zram-perf-test
>
> // hm... may be slightly modified on my box.
>
> I'll run more tests.
>
>

Hello,

I tested with your benchmark and found that contention happens
since the data page is perfectly the same. All the written data (2GB)
is de-duplicated.

I tried to optimize it with read-write lock but I failed since
there is another contention, which cannot be fixed simply. That is
zsmalloc. We need to map the object and compare the content of the
compressed page to check de-duplication. Zsmalloc pins the object
by using bit spinlock when mapping. So, parallel readers to the same
object contend here.

I think that this case is so artificial and, in practice, there
would be no case that the same data page is repeatedly and parallel
written as like this. So, I'd like to keep current code. How do you
think about it, Sergey?

Just note, if we do parallel read (direct-io) to the same offset,
zsmalloc contention would happen regardless deduplication feature.
It seems that it's fundamental issue in zsmalloc.

Thanks.