Re: [PATCH -next] ext4: fix bug_on in ext4_writepages

From: Jan Kara
Date: Wed May 11 2022 - 06:43:55 EST


On Tue 10-05-22 17:48:46, yebin wrote:
> On 2022/5/9 21:01, Jan Kara wrote:
> > On Thu 05-05-22 21:57:08, Ye Bin wrote:
> > > we got issue as follows:
> > > EXT4-fs error (device loop0): ext4_mb_generate_buddy:1141: group 0, block bitmap and bg descriptor inconsistent: 25 vs 31513 free cls
> > > ------------[ cut here ]------------
> > > kernel BUG at fs/ext4/inode.c:2708!
> > > invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
> > > CPU: 2 PID: 2147 Comm: rep Not tainted 5.18.0-rc2-next-20220413+ #155
> > > RIP: 0010:ext4_writepages+0x1977/0x1c10
> > > RSP: 0018:ffff88811d3e7880 EFLAGS: 00010246
> > > RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88811c098000
> > > RDX: 0000000000000000 RSI: ffff88811c098000 RDI: 0000000000000002
> > > RBP: ffff888128140f50 R08: ffffffffb1ff6387 R09: 0000000000000000
> > > R10: 0000000000000007 R11: ffffed10250281ea R12: 0000000000000001
> > > R13: 00000000000000a4 R14: ffff88811d3e7bb8 R15: ffff888128141028
> > > FS: 00007f443aed9740(0000) GS:ffff8883aef00000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000020007200 CR3: 000000011c2a4000 CR4: 00000000000006e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > Call Trace:
> > > <TASK>
> > > do_writepages+0x130/0x3a0
> > > filemap_fdatawrite_wbc+0x83/0xa0
> > > filemap_flush+0xab/0xe0
> > > ext4_alloc_da_blocks+0x51/0x120
> > > __ext4_ioctl+0x1534/0x3210
> > > __x64_sys_ioctl+0x12c/0x170
> > > do_syscall_64+0x3b/0x90
> > >
> > > It may happen as follows:
> > > 1. write inline_data inode
> > > vfs_write
> > > new_sync_write
> > > ext4_file_write_iter
> > > ext4_buffered_write_iter
> > > generic_perform_write
> > > ext4_da_write_begin
> > > ext4_da_write_inline_data_begin -> If inline data size too
> > > small will allocate block to write, then mapping will has
> > > dirty page
> > > ext4_da_convert_inline_data_to_extent ->clear EXT4_STATE_MAY_INLINE_DATA
> > > 2. fallocate
> > > do_vfs_ioctl
> > > ioctl_preallocate
> > > vfs_fallocate
> > > ext4_fallocate
> > > ext4_convert_inline_data
> > > ext4_convert_inline_data_nolock
> > > ext4_map_blocks -> fail will goto restore data
> > > ext4_restore_inline_data
> > > ext4_create_inline_data
> > > ext4_write_inline_data
> > > ext4_set_inode_state -> set inode EXT4_STATE_MAY_INLINE_DATA
> > > 3. writepages
> > > __ext4_ioctl
> > > ext4_alloc_da_blocks
> > > filemap_flush
> > > filemap_fdatawrite_wbc
> > > do_writepages
> > > ext4_writepages
> > > if (ext4_has_inline_data(inode))
> > > BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA))
> > >
> > > To solved this issue, record origin 'EXT4_STATE_MAY_INLINE_DATA' flag, then pass
> > > value to 'ext4_restore_inline_data', 'ext4_restore_inline_data' will
> > > decide to if recovery 'EXT4_STATE_MAY_INLINE_DATA' flag according to parameter.
> > >
> > > Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx>
> > Thanks for the patch. I agree it will fix the crash you have spotted but
> > I'm somewhat wondering whether it would not be simpler to just move the
> > call to ext4_destroy_inline_data_nolock() in
> > ext4_convert_inline_data_nolock() later, after we have done writing
> > data_bh. That way we can completely remove ext4_restore_inline_data() and
> > as a consequence avoid problems. What do you think?
> >
> > Honza
>
> It may be a good idea, but i didn't know how to handle when call
> ext4_destroy_inline_data_nolock() failed. As it may lead to
> 'ei <../cgi-bin/global.cgi?pattern=ei&type=symbol>->i_reserved_data_blocks
> <../cgi-bin/global.cgi?pattern=i_reserved_data_blocks&type=symbol>'
> incorrect, and also lead to data lost.
> I have another idea which will inlcude in v2 patch.

Well, that call failing means something is seriously wrong with the
filesystem (IO errors, metadata corruption) so we don't care much what
happens. Also currently you have the problem that restoration of inline data
can fail so I don't think there's really tangible difference.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR