Re: [f2fs-dev] [PATCH] f2fs: change virtual mapping way for compression pages

From: Daeho Jeong
Date: Tue Aug 11 2020 - 07:21:42 EST


Sure, I'll update the test condition as you said in the commit message.
FYI, the test is done with 16kb chunk and Pixel 3 (arm64) device.

Thanks,

2020년 8월 11일 (화) 오후 7:18, Gao Xiang <hsiangkao@xxxxxxxxxx>님이 작성:
>
> On Tue, Aug 11, 2020 at 06:33:26PM +0900, Daeho Jeong wrote:
> > Plus, when we use vmap(), vmap() normally executes in a short time
> > like vm_map_ram().
> > But, sometimes, it has a very long delay.
> >
> > 2020년 8월 11� (화) 오후 6:28, Daeho Jeong <daeho43@xxxxxxxxx>님� 작성:
> > >
> > > Actually, as you can see, I use the whole zero data blocks in the test file.
> > > It can maximize the effect of changing virtual mapping.
> > > When I use normal files which can be compressed about 70% from the
> > > original file,
> > > The vm_map_ram() version is about 2x faster than vmap() version.
>
> What f2fs does is much similar to btrfs compression. Even if these
> blocks are all zeroed. In principle, the maximum compression ratio
> is determined (cluster sized blocks into one compressed block, e.g
> 16k cluster into one compressed block).
>
> So it'd be better to describe your configured cluster size (16k or
> 128k) and your hardware information in the commit message as well.
>
> Actually, I also tried with this patch as well on my x86 laptop just
> now with FIO (I didn't use zeroed block though), and I didn't notice
> much difference with turbo boost off and maxfreq.
>
> I'm not arguing this commit, just a note about this commit message.
> > > > >> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s
> > > > >> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s
> > > > >> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s
>
> IMHO, the above number is much like decompressing in the arm64 little cores.
>
> Thanks,
> Gao Xiang
>
>
> > >
> > > 2020년 8월 11� (화) 오후 4:55, Chao Yu <yuchao0@xxxxxxxxxx>님� 작성:
> > > >
> > > > On 2020/8/11 15:15, Gao Xiang wrote:
> > > > > On Tue, Aug 11, 2020 at 12:37:53PM +0900, Daeho Jeong wrote:
> > > > >> From: Daeho Jeong <daehojeong@xxxxxxxxxx>
> > > > >>
> > > > >> By profiling f2fs compression works, I've found vmap() callings are
> > > > >> bottlenecks of f2fs decompression path. Changing these with
> > > > >> vm_map_ram(), we can enhance f2fs decompression speed pretty much.
> > > > >>
> > > > >> [Verification]
> > > > >> dd if=/dev/zero of=dummy bs=1m count=1000
> > > > >> echo 3 > /proc/sys/vm/drop_caches
> > > > >> dd if=dummy of=/dev/zero bs=512k
> > > > >>
> > > > >> - w/o compression -
> > > > >> 1048576000 bytes (0.9 G) copied, 1.999384 s, 500 M/s
> > > > >> 1048576000 bytes (0.9 G) copied, 2.035988 s, 491 M/s
> > > > >> 1048576000 bytes (0.9 G) copied, 2.039457 s, 490 M/s
> > > > >>
> > > > >> - before patch -
> > > > >> 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s
> > > > >> 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s
> > > > >> 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s
> > > > >>
> > > > >> - after patch -
> > > > >> 1048576000 bytes (0.9 G) copied, 2.253441 s, 444 M/s
> > > > >> 1048576000 bytes (0.9 G) copied, 2.739764 s, 365 M/s
> > > > >> 1048576000 bytes (0.9 G) copied, 2.185649 s, 458 M/s
> > > > >
> > > > > Indeed, vmap() approach has some impact on the whole
> > > > > workflow. But I don't think the gap is such significant,
> > > > > maybe it relates to unlocked cpufreq (and big little
> > > > > core difference if it's on some arm64 board).
> > > >
> > > > Agreed,
> > > >
> > > > I guess there should be other reason causing the large performance
> > > > gap, scheduling, frequency, or something else.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Linux-f2fs-devel mailing list
> > > > > Linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx
> > > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > > > > .
> > > > >
> >
>