Re: [PATCH 2/2] f2fs: support data compression

From: Jaegeuk Kim
Date: Wed Oct 23 2019 - 13:28:11 EST


On 10/22, Eric Biggers wrote:
> On Tue, Oct 22, 2019 at 10:16:02AM -0700, Jaegeuk Kim wrote:
> > From: Chao Yu <yuchao0@xxxxxxxxxx>
> >
> > This patch tries to support compression in f2fs.
> >
> > - New term named cluster is defined as basic unit of compression, file can
> > be divided into multiple clusters logically. One cluster includes 4 << n
> > (n >= 0) logical pages, compression size is also cluster size, each of
> > cluster can be compressed or not.
> >
> > - In cluster metadata layout, one special flag is used to indicate cluster
> > is compressed one or normal one, for compressed cluster, following metadata
> > maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
> > data including compress header and compressed data.
> >
> > - In order to eliminate write amplification during overwrite, F2FS only
> > support compression on write-once file, data can be compressed only when
> > all logical blocks in file are valid and cluster compress ratio is lower
> > than specified threshold.
> >
> > - To enable compression on regular inode, there are three ways:
> > * chattr +c file
> > * chattr +c dir; touch dir/file
> > * mount w/ -o compress_extension=ext; touch file.ext
> >
> > Compress metadata layout:
> > [Dnode Structure]
> > +-----------------------------------------------+
> > | cluster 1 | cluster 2 | ......... | cluster N |
> > +-----------------------------------------------+
> > . . . .
> > . . . .
> > . Compressed Cluster . . Normal Cluster .
> > +----------+---------+---------+---------+ +---------+---------+---------+---------+
> > |compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
> > +----------+---------+---------+---------+ +---------+---------+---------+---------+
> > . .
> > . .
> > . .
> > +-------------+-------------+----------+----------------------------+
> > | data length | data chksum | reserved | compressed data |
> > +-------------+-------------+----------+----------------------------+
> >
> > Changelog:
> >
> > 20190326:
> > - fix error handling of read_end_io().
> > - remove unneeded comments in f2fs_encrypt_one_page().
> >
> > 20190327:
> > - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
> > - don't jump into loop directly to avoid uninitialized variables.
> > - add TODO tag in error path of f2fs_write_cache_pages().
> >
> > 20190328:
> > - fix wrong merge condition in f2fs_read_multi_pages().
> > - check compressed file in f2fs_post_read_required().
> >
> > 20190401
> > - allow overwrite on non-compressed cluster.
> > - check cluster meta before writing compressed data.
> >
> > 20190402
> > - don't preallocate blocks for compressed file.
> >
> > - add lz4 compress algorithm
> > - process multiple post read works in one workqueue
> > Now f2fs supports processing post read work in multiple workqueue,
> > it shows low performance due to schedule overhead of multiple
> > workqueue executing orderly.
> >
> > - compress: support buffered overwrite
> > C: compress cluster flag
> > V: valid block address
> > N: NEW_ADDR
> >
> > One cluster contain 4 blocks
> >
> > before overwrite after overwrite
> >
> > - VVVV -> CVNN
> > - CVNN -> VVVV
> >
> > - CVNN -> CVNN
> > - CVNN -> CVVV
> >
> > - CVVV -> CVNN
> > - CVVV -> CVVV
> >
> > [Jaegeuk Kim]
> > - add tracepoint for f2fs_{,de}compress_pages()
> > - fix many bugs and add some compression stats
> >
> > Signed-off-by: Chao Yu <yuchao0@xxxxxxxxxx>
> > Signed-off-by: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
>
> How was this tested? Shouldn't there a mount option analogous to
> test_dummy_encryption that causes all files to be auto-compressed, so that a
> full run of xfstests can be done with compression? I see "compress_extension",
> but apparently it's only for a file extension? Also, since reads can involve
> any combination of decryption, compression, and verity, it's important to test
> as many combinations as possible, including all at once. Has that been done?

This patch should be RFC which requires as many tests as possible. I posted it
quite early in order to get some reviews and feedback as well.

What I've done so far would look like:
- mkfs.f2fs -f -O encrypt -O quota -O compression -O extra_attr /dev/sdb1
- mount -t f2fs /dev/sdb1 /mnt/test
- mkdir /mnt/test/comp_dir
- f2fs_io setflags compression /mnt/test/comp_dir
- cd /mnt/test/comp_dir
- git clone kernel.git
- compile kernel
- or, fsstress on top of it

>
> I also tried running the fs-verity xfstests on this with
> 'kvm-xfstests -c f2fs -g verity', but the kernel immediately crashes:

I didn't check verity yet. I'll take a look at this soon.

>
> BUG: kernel NULL pointer dereference, address: 0000000000000182
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.0-rc1-00119-g60f351f4c50f #3
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191013_105130-anatol 04/01/2014
> RIP: 0010:__queue_work+0x3e/0x5f0 kernel/workqueue.c:1409
> Code: d4 53 48 83 ec 18 89 7d d4 8b 3d c1 bf 2a 01 85 ff 74 17 65 48 8b 04 25 80 5d 01 00 8b b0 0c 07 00 00 85 f6 0f 84 1
> RSP: 0018:ffffc900000a8db0 EFLAGS: 00010046
> RAX: ffff88807d94e340 RBX: 0000000000000246 RCX: 0000000000000000
> RDX: ffff88807d9e0be8 RSI: 0000000000000000 RDI: 0000000000000001
> RBP: ffffc900000a8df0 R08: 0000000000000000 R09: 0000000000000001
> R10: ffff888075f2bc68 R11: 0000000000000000 R12: ffff88807d9e0be8
> R13: 0000000000000000 R14: 0000000000000030 R15: ffff88807c2c6780
> FS: 0000000000000000(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000182 CR3: 00000000757e3000 CR4: 00000000003406e0
> Call Trace:
> <IRQ>
> queue_work_on+0x67/0x70 kernel/workqueue.c:1518
> queue_work include/linux/workqueue.h:494 [inline]
> f2fs_enqueue_post_read_work fs/f2fs/data.c:166 [inline]
> bio_post_read_processing fs/f2fs/data.c:173 [inline]
> f2fs_read_end_io+0xcb/0xe0 fs/f2fs/data.c:195
> bio_endio+0xa4/0x1a0 block/bio.c:1818
> req_bio_endio block/blk-core.c:242 [inline]
> blk_update_request+0xf6/0x310 block/blk-core.c:1462
> blk_mq_end_request+0x1c/0x130 block/blk-mq.c:568
> virtblk_request_done+0x32/0x80 drivers/block/virtio_blk.c:226
> blk_done_softirq+0x98/0xc0 block/blk-softirq.c:37
> __do_softirq+0xc1/0x40d kernel/softirq.c:292
> invoke_softirq kernel/softirq.c:373 [inline]
> irq_exit+0xb3/0xc0 kernel/softirq.c:413
> exiting_irq arch/x86/include/asm/apic.h:536 [inline]
> do_IRQ+0x5b/0x110 arch/x86/kernel/irq.c:263
> common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:607
> </IRQ>
> RIP: 0010:native_safe_halt arch/x86/include/asm/irqflags.h:60 [inline]
> RIP: 0010:arch_safe_halt arch/x86/include/asm/irqflags.h:103 [inline]
> RIP: 0010:default_idle+0x29/0x160 arch/x86/kernel/process.c:580
> Code: 90 55 48 89 e5 41 55 41 54 65 44 8b 25 70 64 76 7e 53 0f 1f 44 00 00 e8 95 13 88 ff e9 07 00 00 00 0f 00 2d 8b c0 b
> RSP: 0018:ffffc90000073e78 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdc
> RAX: ffff88807d94e340 RBX: 0000000000000001 RCX: 0000000000000000
> RDX: 0000000000000046 RSI: 0000000000000006 RDI: ffff88807d94e340
> RBP: ffffc90000073e90 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> R13: ffff88807d94e340 R14: 0000000000000000 R15: 0000000000000000
> arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
> default_idle_call+0x1e/0x30 kernel/sched/idle.c:94
> cpuidle_idle_call kernel/sched/idle.c:154 [inline]
> do_idle+0x1e4/0x210 kernel/sched/idle.c:263
> cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
> start_secondary+0x151/0x1a0 arch/x86/kernel/smpboot.c:264
> secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
> CR2: 0000000000000182
> ---[ end trace 86328090a3179142 ]---
> RIP: 0010:__queue_work+0x3e/0x5f0 kernel/workqueue.c:1409
> Code: d4 53 48 83 ec 18 89 7d d4 8b 3d c1 bf 2a 01 85 ff 74 17 65 48 8b 04 25 80 5d 01 00 8b b0 0c 07 00 00 85 f6 0f 84 1
> RSP: 0018:ffffc900000a8db0 EFLAGS: 00010046
> RAX: ffff88807d94e340 RBX: 0000000000000246 RCX: 0000000000000000
> RDX: ffff88807d9e0be8 RSI: 0000000000000000 RDI: 0000000000000001
> RBP: ffffc900000a8df0 R08: 0000000000000000 R09: 0000000000000001
> R10: ffff888075f2bc68 R11: 0000000000000000 R12: ffff88807d9e0be8
> R13: 0000000000000000 R14: 0000000000000030 R15: ffff88807c2c6780
> FS: 0000000000000000(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000182 CR3: 00000000757e3000 CR4: 00000000003406e0
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: disabled
> Rebooting in 5 seconds..