Re: [PATCH] f2fs: fix potential deadlock in f2fs_balance_fs()

From: Chao Yu

Date: Mon Apr 27 2026 - 03:47:43 EST


On 4/26/26 17:30, Ruipeng Qi wrote:

On 2026/4/20 15:35, Chao Yu wrote:
Hi Ruipeng,

Sorry, I missed your patch.

On 3/25/2026 9:37 PM, ruipengqi wrote:
From: Ruipeng Qi <ruipengqi3@xxxxxxxxx>

When the f2fs filesystem space is nearly exhausted, we encounter deadlock
issues as below:

INFO: task A:1890 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:A    state:D stack:0     pid:1890  tgid:1626  ppid:1153 flags:0x00000204
Call trace:
  __switch_to+0xf4/0x158
  __schedule+0x27c/0x908
  schedule+0x3c/0x118
  io_schedule+0x44/0x68
  folio_wait_bit_common+0x174/0x370
  folio_wait_bit+0x20/0x38
  folio_wait_writeback+0x54/0xc8
  truncate_inode_partial_folio+0x70/0x1e0
  truncate_inode_pages_range+0x1b0/0x450
  truncate_pagecache+0x54/0x88
  f2fs_file_write_iter+0x3e8/0xb80
  do_iter_readv_writev+0xf0/0x1e0
  vfs_writev+0x138/0x2c8
  do_writev+0x88/0x130
  __arm64_sys_writev+0x28/0x40
  invoke_syscall+0x50/0x120
  el0_svc_common.constprop.0+0xc8/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x30/0xf8
  el0t_64_sync_handler+0x120/0x130
  el0t_64_sync+0x190/0x198

INFO: task kworker/u8:11:2680853 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:11   state:D stack:0     pid:2680853 tgid:2680853 ppid:2      flags:0x00000208
Workqueue: writeback wb_workfn (flush-254:0)
Call trace:
  __switch_to+0xf4/0x158
  __schedule+0x27c/0x908
  schedule+0x3c/0x118
  io_schedule+0x44/0x68
  folio_wait_bit_common+0x174/0x370
  __filemap_get_folio+0x214/0x348
  pagecache_get_page+0x20/0x70
  f2fs_get_read_data_page+0x150/0x3e8
  f2fs_get_lock_data_page+0x2c/0x160
  move_data_page+0x50/0x478
  do_garbage_collect+0xd38/0x1528
  f2fs_gc+0x240/0x7e0
  f2fs_balance_fs+0x1a0/0x208
  f2fs_write_single_data_page+0x6e4/0x730  //0xfffffe0d6ca08300
  f2fs_write_cache_pages+0x378/0x9b0
  f2fs_write_data_pages+0x2e4/0x388
  do_writepages+0x8c/0x2c8
  __writeback_single_inode+0x4c/0x498
  writeback_sb_inodes+0x234/0x4a8
  __writeback_inodes_wb+0x58/0x118
  wb_writeback+0x2f8/0x3c0
  wb_workfn+0x2c4/0x508
  process_one_work+0x180/0x408
  worker_thread+0x258/0x368
  kthread+0x118/0x128
  ret_from_fork+0x10/0x200

INFO: task kworker/u8:8:2641297 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:8    state:D stack:0     pid:2641297 tgid:2641297 ppid:2      flags:0x00000208
Workqueue: writeback wb_workfn (flush-254:0)
Call trace:
  __switch_to+0xf4/0x158
  __schedule+0x27c/0x908
  rt_mutex_schedule+0x30/0x60
  __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
  rwbase_write_lock+0x24c/0x378
  down_write+0x1c/0x30
  f2fs_balance_fs+0x184/0x208
  f2fs_write_inode+0xf4/0x328
  __writeback_single_inode+0x370/0x498
  writeback_sb_inodes+0x234/0x4a8
  __writeback_inodes_wb+0x58/0x118
  wb_writeback+0x2f8/0x3c0
  wb_workfn+0x2c4/0x508
  process_one_work+0x180/0x408
  worker_thread+0x258/0x368
  kthread+0x118/0x128
  ret_from_fork+0x10/0x20

INFO: task B:1902 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:B     state:D stack:0     pid:1902  tgid:1626  ppid:1153 flags:0x0000020c
Call trace:
  __switch_to+0xf4/0x158
  __schedule+0x27c/0x908
  rt_mutex_schedule+0x30/0x60
  __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
  rwbase_write_lock+0x24c/0x378
  down_write+0x1c/0x30
  f2fs_balance_fs+0x184/0x208
  f2fs_map_blocks+0x94c/0x1110
  f2fs_file_write_iter+0x228/0xb80
  do_iter_readv_writev+0xf0/0x1e0
  vfs_writev+0x138/0x2c8
  do_writev+0x88/0x130
  __arm64_sys_writev+0x28/0x40
  invoke_syscall+0x50/0x120
  el0_svc_common.constprop.0+0xc8/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x30/0xf8
  el0t_64_sync_handler+0x120/0x130
  el0t_64_sync+0x190/0x198

INFO: task sync:2769849 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:sync            state:D stack:0     pid:2769849 tgid:2769849 ppid:736    flags:0x0000020c
Call trace:
  __switch_to+0xf4/0x158
  __schedule+0x27c/0x908
  schedule+0x3c/0x118
  wb_wait_for_completion+0xb0/0xe8
  sync_inodes_sb+0xc8/0x2b0
  sync_inodes_one_sb+0x24/0x38
  iterate_supers+0xa8/0x138
  ksys_sync+0x54/0xc8
  __arm64_sys_sync+0x18/0x30
  invoke_syscall+0x50/0x120
  el0_svc_common.constprop.0+0xc8/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x30/0xf8
  el0t_64_sync_handler+0x120/0x130
  el0t_64_sync+0x190/0x198

The root cause is a potential deadlock between the following tasks:

kworker/u8:11                Thread A
- f2fs_write_single_data_page
  - f2fs_do_write_data_page
   - folio_start_writeback(X)
   - f2fs_outplace_write_data
    - bio_add_folio(X)
  - folio_unlock(X)
                    - truncate_inode_pages_range
                     - __filemap_get_folio(X, FGP_LOCK)
                     - truncate_inode_partial_folio(X)
                      - folio_wait_writeback(X)
  - f2fs_balance_fs
   - f2fs_gc
    - do_garbage_collect
     - move_data_page
      - f2fs_get_lock_data_page
       - __filemap_get_folio(X, FGP_LOCK)

Both threads try to access folio X. Thread A holds the lock but waits
for writeback, while kworker waits for the lock. This causes a deadlock.

Other threads also enter D state, waiting for locks such as gc_lock and
writepages.

To avoid this potential deadlock, always call f2fs_submit_merged_write
before triggering f2fs_gc in f2fs_balance_fs.

Signed-off-by: Ruipeng Qi <ruipengqi3@xxxxxxxxx>
---
  fs/f2fs/segment.c | 14 ++++++++++++++
  1 file changed, 14 insertions(+)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 6a97fe76712b..b58299e49c23 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -454,6 +454,20 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
          io_schedule();
          finish_wait(&sbi->gc_thread->fggc_wq, &wait);
      } else {
+
+        /*
+         * Before triggering foreground GC, submit all cached DATA
+         * write bios. During writeback, pages may be added to
+         * write_io[DATA].bio with PG_writeback set but the bio not
+         * yet submitted. If GC's move_data_page() blocks on
+         * __folio_lock() for such a folio, and the lock holder waits
+         * for PG_writeback to clear via VFS folio_wait_writeback()
+         * neither thread can make progress. Flushing here ensures
+         * the bio completion callback can clear PG_writeback.
+         */
+
+        f2fs_submit_merged_write(sbi, DATA);

Do we need to call f2fs_submit_merged_ipu_write(sbi, bio, NULL) to commit
cached IPU folios as well?

Not sure, this race condition will happen for node folio.

Thanks,

Hi, Chao

Thanks for your suggestion. After deeper analysis, this race condition
applies to IPU folios but not node folios. Node folios are unlikely to
have this flow.

Ruipeng,

I agree, I don't see any flow calling truncate_inode_pages_range(node_inode) will
race w/ writepage -> balance_fs.


I will send a corrected version shortly.
v2:
- Commit cached OPU and IPU folios, not just OPU folios as in v1.

BTW, Do you think it is possible to add an optional ->wait_folio_writeback()
callback to address_space_operations. when provided,
truncate_inode_partial_folio() calls f2fs_wait_on_page_writeback instead of
the generic folio_wait_writeback(), which also fix this race condition.

Yes, I think it will be better as it can fix all potential wait_writeback bugs,
I guess we can have a try.

Thanks,


Thanks,

+
          struct f2fs_gc_control gc_control = {
              .victim_segno = NULL_SEGNO,
              .init_gc_type = f2fs_sb_has_blkzoned(sbi) ?