Re: [PATCH] f2fs: fix potential deadlock in f2fs_balance_fs()

From: Chao Yu

Date: Mon Apr 27 2026 - 03:47:43 EST

On 4/26/26 17:30, Ruipeng Qi wrote:

On 2026/4/20 15:35, Chao Yu wrote:

Hi Ruipeng,

Sorry, I missed your patch.

On 3/25/2026 9:37 PM, ruipengqi wrote:

From: Ruipeng Qi <ruipengqi3@xxxxxxxxx>

When the f2fs filesystem space is nearly exhausted, we encounter deadlock
issues as below:

INFO: task A:1890 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:A    state:D stack:0     pid:1890 tgid:1626 ppid:1153 flags:0x00000204
Call trace:
__switch_to+0xf4/0x158
__schedule+0x27c/0x908
schedule+0x3c/0x118
io_schedule+0x44/0x68
folio_wait_bit_common+0x174/0x370
folio_wait_bit+0x20/0x38
folio_wait_writeback+0x54/0xc8
truncate_inode_partial_folio+0x70/0x1e0
truncate_inode_pages_range+0x1b0/0x450
truncate_pagecache+0x54/0x88
f2fs_file_write_iter+0x3e8/0xb80
do_iter_readv_writev+0xf0/0x1e0
vfs_writev+0x138/0x2c8
do_writev+0x88/0x130
__arm64_sys_writev+0x28/0x40
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x30/0xf8
el0t_64_sync_handler+0x120/0x130
el0t_64_sync+0x190/0x198

INFO: task kworker/u8:11:2680853 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:11   state:D stack:0     pid:2680853 tgid:2680853 ppid:2      flags:0x00000208
Workqueue: writeback wb_workfn (flush-254:0)
Call trace:
__switch_to+0xf4/0x158
__schedule+0x27c/0x908
schedule+0x3c/0x118
io_schedule+0x44/0x68
folio_wait_bit_common+0x174/0x370
__filemap_get_folio+0x214/0x348
pagecache_get_page+0x20/0x70
f2fs_get_read_data_page+0x150/0x3e8
f2fs_get_lock_data_page+0x2c/0x160
move_data_page+0x50/0x478
do_garbage_collect+0xd38/0x1528
f2fs_gc+0x240/0x7e0
f2fs_balance_fs+0x1a0/0x208
f2fs_write_single_data_page+0x6e4/0x730 //0xfffffe0d6ca08300
f2fs_write_cache_pages+0x378/0x9b0
f2fs_write_data_pages+0x2e4/0x388
do_writepages+0x8c/0x2c8
__writeback_single_inode+0x4c/0x498
writeback_sb_inodes+0x234/0x4a8
__writeback_inodes_wb+0x58/0x118
wb_writeback+0x2f8/0x3c0
wb_workfn+0x2c4/0x508
process_one_work+0x180/0x408
worker_thread+0x258/0x368
kthread+0x118/0x128
ret_from_fork+0x10/0x200

INFO: task kworker/u8:8:2641297 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:8    state:D stack:0     pid:2641297 tgid:2641297 ppid:2      flags:0x00000208
Workqueue: writeback wb_workfn (flush-254:0)
Call trace:
__switch_to+0xf4/0x158
__schedule+0x27c/0x908
rt_mutex_schedule+0x30/0x60
__rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
rwbase_write_lock+0x24c/0x378
down_write+0x1c/0x30
f2fs_balance_fs+0x184/0x208
f2fs_write_inode+0xf4/0x328
__writeback_single_inode+0x370/0x498
writeback_sb_inodes+0x234/0x4a8
__writeback_inodes_wb+0x58/0x118
wb_writeback+0x2f8/0x3c0
wb_workfn+0x2c4/0x508
process_one_work+0x180/0x408
worker_thread+0x258/0x368
kthread+0x118/0x128
ret_from_fork+0x10/0x20

INFO: task B:1902 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:B     state:D stack:0     pid:1902 tgid:1626 ppid:1153 flags:0x0000020c
Call trace:
__switch_to+0xf4/0x158
__schedule+0x27c/0x908
rt_mutex_schedule+0x30/0x60
__rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
rwbase_write_lock+0x24c/0x378
down_write+0x1c/0x30
f2fs_balance_fs+0x184/0x208
f2fs_map_blocks+0x94c/0x1110
f2fs_file_write_iter+0x228/0xb80
do_iter_readv_writev+0xf0/0x1e0
vfs_writev+0x138/0x2c8
do_writev+0x88/0x130
__arm64_sys_writev+0x28/0x40
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x30/0xf8
el0t_64_sync_handler+0x120/0x130
el0t_64_sync+0x190/0x198

INFO: task sync:2769849 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:sync            state:D stack:0     pid:2769849 tgid:2769849 ppid:736    flags:0x0000020c
Call trace:
__switch_to+0xf4/0x158
__schedule+0x27c/0x908
schedule+0x3c/0x118
wb_wait_for_completion+0xb0/0xe8
sync_inodes_sb+0xc8/0x2b0
sync_inodes_one_sb+0x24/0x38
iterate_supers+0xa8/0x138
ksys_sync+0x54/0xc8
__arm64_sys_sync+0x18/0x30
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x30/0xf8
el0t_64_sync_handler+0x120/0x130
el0t_64_sync+0x190/0x198

The root cause is a potential deadlock between the following tasks:

kworker/u8:11                Thread A
- f2fs_write_single_data_page
- f2fs_do_write_data_page
   - folio_start_writeback(X)
   - f2fs_outplace_write_data
    - bio_add_folio(X)
- folio_unlock(X)
                    - truncate_inode_pages_range
                     - __filemap_get_folio(X, FGP_LOCK)
                     - truncate_inode_partial_folio(X)
                      - folio_wait_writeback(X)
- f2fs_balance_fs
   - f2fs_gc
    - do_garbage_collect
     - move_data_page
      - f2fs_get_lock_data_page
       - __filemap_get_folio(X, FGP_LOCK)

Both threads try to access folio X. Thread A holds the lock but waits
for writeback, while kworker waits for the lock. This causes a deadlock.

Other threads also enter D state, waiting for locks such as gc_lock and
writepages.

To avoid this potential deadlock, always call f2fs_submit_merged_write
before triggering f2fs_gc in f2fs_balance_fs.

Signed-off-by: Ruipeng Qi <ruipengqi3@xxxxxxxxx>
---
fs/f2fs/segment.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 6a97fe76712b..b58299e49c23 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -454,6 +454,20 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
          io_schedule();
          finish_wait(&sbi->gc_thread->fggc_wq, &wait);
      } else {
+
+        /*
+         * Before triggering foreground GC, submit all cached DATA
+         * write bios. During writeback, pages may be added to
+         * write_io[DATA].bio with PG_writeback set but the bio not
+         * yet submitted. If GC's move_data_page() blocks on
+         * __folio_lock() for such a folio, and the lock holder waits
+         * for PG_writeback to clear via VFS folio_wait_writeback()
+         * neither thread can make progress. Flushing here ensures
+         * the bio completion callback can clear PG_writeback.
+         */
+
+        f2fs_submit_merged_write(sbi, DATA);

Do we need to call f2fs_submit_merged_ipu_write(sbi, bio, NULL) to commit
cached IPU folios as well?

Not sure, this race condition will happen for node folio.

Thanks,

Hi, Chao

Thanks for your suggestion. After deeper analysis, this race condition
applies to IPU folios but not node folios. Node folios are unlikely to
have this flow.

Ruipeng,

I agree, I don't see any flow calling truncate_inode_pages_range(node_inode) will
race w/ writepage -> balance_fs.

I will send a corrected version shortly.
v2:
- Commit cached OPU and IPU folios, not just OPU folios as in v1.

BTW, Do you think it is possible to add an optional ->wait_folio_writeback()
callback to address_space_operations. when provided,
truncate_inode_partial_folio() calls f2fs_wait_on_page_writeback instead of
the generic folio_wait_writeback(), which also fix this race condition.

Yes, I think it will be better as it can fix all potential wait_writeback bugs,
I guess we can have a try.

Thanks,

Thanks,

+
          struct f2fs_gc_control gc_control = {
              .victim_segno = NULL_SEGNO,
              .init_gc_type = f2fs_sb_has_blkzoned(sbi) ?