Re: [PATCH] zram: export the number of available comp streams

From: Minchan Kim
Date: Thu Mar 17 2016 - 20:31:46 EST


Hi Sergey,

I forgot this patch until now. Sorry about that.

On Mon, Feb 01, 2016 at 10:02:48AM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
>
> On (01/29/16 16:28), Minchan Kim wrote:
> > Hello Sergey,
> >
> > Sorry to late response. Thesedays, I'm really busy with personal
> > stuff.
>
> sure, no worries :)
>
> > On Tue, Jan 26, 2016 at 09:03:59PM +0900, Sergey Senozhatsky wrote:
> > > I've been asked several very simple questions:
> > > a) How can I ensure that zram uses (or used) several compression
> > > streams?
> >
> > Why does he want to ensure several compression streams?
> > As you know well, zram handle it dynamically.
> >
> > If zram cannot allocate more streams, it means the system is
> > heavily fragmented or memory pressure at that time so there
> > is no worth to add more stream, I think.
> >
> > Could you elaborate it more why he want to know it and what
> > he expect from that?
>
> good questions. I believe mostly it's about fine-tuning on a
> per-device basis, which is getting especially tricky when zram
> devices are used as a sort of in-memory tmp storage for various
> applications (black boxen).
>
> > > b) What is the current number of comp streams (how much memory
> > > does zram *actually* use for compression streams, if there are
> > > more than one stream)?
> >
> > Hmm, in the kernel, there are lots of example subsystem
> > we cannot know exact memory usage. Why does the user want
> > to know exact memory usage of zram? What is his concern?
>
> certainly true. probably some of those sub-systems/drivers have some
> sort of LRU, or shrinker callbacks, to release unneeded memory back.
> zram only allocates streams, and it basically hard to tell how many:
> up to max_comp_streams, which can be larger than the number of cpus
> on the system; because we keep preemption enabled (I didn't realize
> that until I played with the patch) around
> zcomp_strm_find()/zcomp_strm_release():
>
> zram_bvec_write()
> {
> ...
> zstrm = zcomp_strm_find(zram->comp);
> >> can preempt
> user_mem = kmap_atomic(page);
> >> now atomic
> zcomp_compress()
> ...
> kunmap_atomic()
> >> can preempt
> zcomp_strm_release()
> ...
> }
>
> so how many streams I can have on my old 4-cpus x86_64 box?
>
> 10?
> yes.
>
> # cat /sys/block/zram0/mm_stat
> 630484992 9288707 13103104 0 13103104 16240 0 10
>
> 16?
> yes.
>
> # cat /sys/block/zram0/mm_stat
> 1893117952 25296718 31354880 0 31354880 15342 0 16
>
> 21?
> yes.
>
> # cat /sys/block/zram0/mm_stat
> 1893167104 28499936 46616576 0 46616576 15330 0 21
>
> do I need 21? may be no. do I nede 18? if 18 streams are needed only 10%
> of the time (I can figure it out by doing repetitive cat zramX/mm_stat),
> then I can set max_comp_streams to make 90% of applications happy, e.g.
> max_comp_streams to 10, and save some memory.
>

Okay. Let's go back to zcomp design decade. As you remember, the reason
we separated single and multi stream code was performance caused by
locking scheme(ie, mutex_lock in single stream model was really fast
than sleep/wakeup model in multi stream).
If we could overcome that problem back then, we should have gone to
multi stream code default.

How about using *per-cpu* streams?

I remember you wanted to create max number of comp streams statically
although I didn't want at that time but I change my decision.

Let's allocate comp stream statically but remove max_comp_streams
knob. Instead, by default, zram alloctes number of streams according
to the number of online CPU.

So I think we can solve locking scheme issue in single stream
, guarantee parallel level as well as enhancing performance with
no locking.

Downside with the approach is that unnecessary memory space reserve
although zram might be used 1% of running system time. But we
should give it up for other benefits(ie, simple code, removing
max_comp_streams knob, no need to this your stat, guarantee parallel
level, guarantee consumed memory space).

What do you think about it?