Re: [PATCH v4 05/11] mm: do not split a folio if it has minimum folio order requirement

From: Pankaj Raghav (Samsung)
Date: Fri Apr 26 2024 - 11:49:42 EST


On Thu, Apr 25, 2024 at 09:10:16PM +0100, Matthew Wilcox wrote:
> On Thu, Apr 25, 2024 at 01:37:40PM +0200, Pankaj Raghav (Samsung) wrote:
> > From: Pankaj Raghav <p.raghav@xxxxxxxxxxx>
> >
> > Splitting a larger folio with a base order is supported using
> > split_huge_page_to_list_to_order() API. However, using that API for LBS
> > is resulting in an NULL ptr dereference error in the writeback path [1].
> >
> > Refuse to split a folio if it has minimum folio order requirement until
> > we can start using split_huge_page_to_list_to_order() API. Splitting the
> > folio can be added as a later optimization.
> >
> > [1] https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df
>
> Obviously this has to be tracked down and fixed before this patchset can
> be merged ... I think I have some ideas. Let me look a bit. How
> would I go about reproducing this?

I am able to reproduce it in a VM with 4G RAM and running generic/447
(sometimes you have to run it twice) on a 16K BS on a 4K PS system.

I have a suspicion on this series: https://lore.kernel.org/linux-fsdevel/20240215063649.2164017-1-hch@xxxxxx/
but I am still unsure why this is happening when we split with LBS
configurations.

If you have kdevops installed, then go with Luis's suggestion, or else
this is my local config.

This is the diff I applied instead of this patch:

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9859aa4f7553..63ee7b6ed03d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3041,6 +3041,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
{
struct folio *folio = page_folio(page);
struct deferred_split *ds_queue = get_deferred_split_queue(folio);
+ unsigned int mapping_min_order = mapping_min_folio_order(folio->mapping);
+
+ if (!folio_test_anon(folio))
+ new_order = max_t(unsigned int, mapping_min_order, new_order);
/* reset xarray order to new order after split */
XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
struct anon_vma *anon_vma = NULL;
@@ -3117,6 +3121,8 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
goto out;
}

+ // XXX: Remove it later
+ VM_WARN_ON_FOLIO((new_order < mapping_min_order), folio);
gfp = current_gfp_context(mapping_gfp_mask(mapping) &
GFP_RECLAIM_MASK);

(END)

xfstests is based on https://github.com/kdave/xfstests/tree/v2024.04.14

xfstests config:

[default]
FSTYP=xfs
RESULT_BASE=/root/results/
DUMP_CORRUPT_FS=1
CANON_DEVS=yes
RECREATE_TEST_DEV=true
TEST_DEV=/dev/nvme0n1
TEST_DIR=/media/test
SCRATCH_DEV=/dev/vdb
SCRATCH_MNT=/media/scratch
LOGWRITES_DEV=/dev/vdc

[16k_4ks]
MKFS_OPTIONS='-f -m reflink=1,rmapbt=1, -i sparse=1, -b size=16k, -s size=4k'

[nix-shell:~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vdb 254:16 0 32G 0 disk /media/scratch
vdc 254:32 0 32G 0 disk
nvme0n1 259:0 0 32G 0 disk /media/test

$ ./check -s 16k_4ks generic/447

BT:
[ 74.170698] BUG: KASAN: null-ptr-deref in filemap_get_folios_tag+0x14b/0x510
[ 74.170938] Write of size 4 at addr 0000000000000036 by task kworker/u16:6/284
[ 74.170938]
[ 74.170938] CPU: 0 PID: 284 Comm: kworker/u16:6 Not tainted 6.9.0-rc4-00011-g4676d00b6f6f #7
[ 74.170938] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[ 74.170938] Workqueue: writeback wb_workfn (flush-254:16)
[ 74.170938] Call Trace:
[ 74.170938] <TASK>
[ 74.170938] dump_stack_lvl+0x51/0x70
[ 74.170938] kasan_report+0xab/0xe0
[ 74.170938] ? filemap_get_folios_tag+0x14b/0x510
[ 74.170938] kasan_check_range+0x35/0x1b0
[ 74.170938] filemap_get_folios_tag+0x14b/0x510
[ 74.170938] ? __pfx_filemap_get_folios_tag+0x10/0x10
[ 74.170938] ? srso_return_thunk+0x5/0x5f
[ 74.170938] writeback_iter+0x508/0xcc0
[ 74.170938] ? __pfx_iomap_do_writepage+0x10/0x10
[ 74.170938] write_cache_pages+0x80/0x100
[ 74.170938] ? __pfx_write_cache_pages+0x10/0x10
[ 74.170938] ? srso_return_thunk+0x5/0x5f
[ 74.170938] ? srso_return_thunk+0x5/0x5f
[ 74.170938] ? srso_return_thunk+0x5/0x5f
[ 74.170938] ? _raw_spin_lock+0x87/0xe0
[ 74.170938] iomap_writepages+0x85/0xe0
[ 74.170938] xfs_vm_writepages+0xe3/0x140 [xfs]
[ 74.170938] ? __pfx_xfs_vm_writepages+0x10/0x10 [xfs]
[ 74.170938] ? kasan_save_track+0x10/0x30
[ 74.170938] ? srso_return_thunk+0x5/0x5f
[ 74.170938] ? __kasan_kmalloc+0x7b/0x90
[ 74.170938] ? srso_return_thunk+0x5/0x5f
[ 74.170938] ? virtqueue_add_split+0x605/0x1b00
[ 74.170938] do_writepages+0x176/0x740
[ 74.170938] ? __pfx_do_writepages+0x10/0x10
[ 74.170938] ? __pfx_virtqueue_add_split+0x10/0x10
[ 74.170938] ? __pfx_update_sd_lb_stats.constprop.0+0x10/0x10
[ 74.170938] ? srso_return_thunk+0x5/0x5f
[ 74.170938] ? virtqueue_add_sgs+0xfe/0x130
[ 74.170938] ? srso_return_thunk+0x5/0x5f
[ 74.170938] ? virtblk_add_req+0x15c/0x280
[ 74.170938] __writeback_single_inode+0x9f/0x840
[ 74.170938] ? wbc_attach_and_unlock_inode+0x345/0x5d0
[ 74.170938] writeback_sb_inodes+0x491/0xce0
[ 74.170938] ? __pfx_wb_calc_thresh+0x10/0x10
[ 74.170938] ? __pfx_writeback_sb_inodes+0x10/0x10
[ 74.170938] ? __wb_calc_thresh+0x1a0/0x3c0
[ 74.170938] ? __pfx_down_read_trylock+0x10/0x10
[ 74.170938] ? wb_over_bg_thresh+0x16b/0x5e0
[ 74.170938] ? __pfx_move_expired_inodes+0x10/0x10
[ 74.170938] __writeback_inodes_wb+0xb7/0x200
[ 74.170938] wb_writeback+0x2c4/0x660
[ 74.170938] ? __pfx_wb_writeback+0x10/0x10
[ 74.170938] ? __pfx__raw_spin_lock_irq+0x10/0x10
[ 74.170938] wb_workfn+0x54e/0xaf0
[ 74.170938] ? srso_return_thunk+0x5/0x5f
[ 74.170938] ? __pfx_wb_workfn+0x10/0x10
[ 74.170938] ? __pfx___schedule+0x10/0x10
[ 74.170938] ? __pfx__raw_spin_lock_irq+0x10/0x10
[ 74.170938] process_one_work+0x622/0x1020
[ 74.170938] worker_thread+0x844/0x10e0
[ 74.170938] ? srso_return_thunk+0x5/0x5f
[ 74.170938] ? __kthread_parkme+0x82/0x150
[ 74.170938] ? __pfx_worker_thread+0x10/0x10
[ 74.170938] kthread+0x2b4/0x380
[ 74.170938] ? __pfx_kthread+0x10/0x10
[ 74.170938] ret_from_fork+0x30/0x70
[ 74.170938] ? __pfx_kthread+0x10/0x10
[ 74.170938] ret_from_fork_asm+0x1a/0x30
[ 74.170938] </TASK>