Re: [PATCH v2] f2fs: fix potential deadlock in f2fs_balance_fs()

From: Ruipeng Qi

Date: Sat May 02 2026 - 08:46:08 EST



On 2026/4/29 15:59, Chao Yu wrote:
On 4/29/26 11:39, Ruipeng Qi wrote:

On 2026/4/27 16:38, Chao Yu wrote:
On 4/26/26 17:32, ruipengqi wrote:
From: Ruipeng Qi <ruipengqi3@xxxxxxxxx>

When the f2fs filesystem space is nearly exhausted, we encounter deadlock
issues as below:

INFO: task A:1890 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:A    state:D stack:0     pid:1890  tgid:1626  ppid:1153 flags:0x00000204
Call trace:
  __switch_to+0xf4/0x158
  __schedule+0x27c/0x908
  schedule+0x3c/0x118
  io_schedule+0x44/0x68
  folio_wait_bit_common+0x174/0x370
  folio_wait_bit+0x20/0x38
  folio_wait_writeback+0x54/0xc8
  truncate_inode_partial_folio+0x70/0x1e0
  truncate_inode_pages_range+0x1b0/0x450
  truncate_pagecache+0x54/0x88
  f2fs_file_write_iter+0x3e8/0xb80
  do_iter_readv_writev+0xf0/0x1e0
  vfs_writev+0x138/0x2c8
  do_writev+0x88/0x130
  __arm64_sys_writev+0x28/0x40
  invoke_syscall+0x50/0x120
  el0_svc_common.constprop.0+0xc8/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x30/0xf8
  el0t_64_sync_handler+0x120/0x130
  el0t_64_sync+0x190/0x198

INFO: task kworker/u8:11:2680853 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:11   state:D stack:0     pid:2680853 tgid:2680853 ppid:2      flags:0x00000208
Workqueue: writeback wb_workfn (flush-254:0)
Call trace:
  __switch_to+0xf4/0x158
  __schedule+0x27c/0x908
  schedule+0x3c/0x118
  io_schedule+0x44/0x68
  folio_wait_bit_common+0x174/0x370
  __filemap_get_folio+0x214/0x348
  pagecache_get_page+0x20/0x70
  f2fs_get_read_data_page+0x150/0x3e8
  f2fs_get_lock_data_page+0x2c/0x160
  move_data_page+0x50/0x478
  do_garbage_collect+0xd38/0x1528
  f2fs_gc+0x240/0x7e0
  f2fs_balance_fs+0x1a0/0x208
  f2fs_write_single_data_page+0x6e4/0x730 //0xfffffe0d6ca08300
  f2fs_write_cache_pages+0x378/0x9b0
  f2fs_write_data_pages+0x2e4/0x388
  do_writepages+0x8c/0x2c8
  __writeback_single_inode+0x4c/0x498
  writeback_sb_inodes+0x234/0x4a8
  __writeback_inodes_wb+0x58/0x118
  wb_writeback+0x2f8/0x3c0
  wb_workfn+0x2c4/0x508
  process_one_work+0x180/0x408
  worker_thread+0x258/0x368
  kthread+0x118/0x128
  ret_from_fork+0x10/0x200

INFO: task kworker/u8:8:2641297 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:8    state:D stack:0     pid:2641297 tgid:2641297 ppid:2      flags:0x00000208
Workqueue: writeback wb_workfn (flush-254:0)
Call trace:
  __switch_to+0xf4/0x158
  __schedule+0x27c/0x908
  rt_mutex_schedule+0x30/0x60
  __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
  rwbase_write_lock+0x24c/0x378
  down_write+0x1c/0x30
  f2fs_balance_fs+0x184/0x208
  f2fs_write_inode+0xf4/0x328
  __writeback_single_inode+0x370/0x498
  writeback_sb_inodes+0x234/0x4a8
  __writeback_inodes_wb+0x58/0x118
  wb_writeback+0x2f8/0x3c0
  wb_workfn+0x2c4/0x508
  process_one_work+0x180/0x408
  worker_thread+0x258/0x368
  kthread+0x118/0x128
  ret_from_fork+0x10/0x20

INFO: task B:1902 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:B     state:D stack:0     pid:1902  tgid:1626 ppid:1153 flags:0x0000020c
Call trace:
  __switch_to+0xf4/0x158
  __schedule+0x27c/0x908
  rt_mutex_schedule+0x30/0x60
  __rt_mutex_slowlock_locked.constprop.0+0x460/0x8a8
  rwbase_write_lock+0x24c/0x378
  down_write+0x1c/0x30
  f2fs_balance_fs+0x184/0x208
  f2fs_map_blocks+0x94c/0x1110
  f2fs_file_write_iter+0x228/0xb80
  do_iter_readv_writev+0xf0/0x1e0
  vfs_writev+0x138/0x2c8
  do_writev+0x88/0x130
  __arm64_sys_writev+0x28/0x40
  invoke_syscall+0x50/0x120
  el0_svc_common.constprop.0+0xc8/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x30/0xf8
  el0t_64_sync_handler+0x120/0x130
  el0t_64_sync+0x190/0x198

INFO: task sync:2769849 blocked for more than 120 seconds.
       Tainted: G           O       6.12.41-g3fe07ddf05ab #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:sync            state:D stack:0     pid:2769849 tgid:2769849 ppid:736    flags:0x0000020c
Call trace:
  __switch_to+0xf4/0x158
  __schedule+0x27c/0x908
  schedule+0x3c/0x118
  wb_wait_for_completion+0xb0/0xe8
  sync_inodes_sb+0xc8/0x2b0
  sync_inodes_one_sb+0x24/0x38
  iterate_supers+0xa8/0x138
  ksys_sync+0x54/0xc8
  __arm64_sys_sync+0x18/0x30
  invoke_syscall+0x50/0x120
  el0_svc_common.constprop.0+0xc8/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x30/0xf8
  el0t_64_sync_handler+0x120/0x130
  el0t_64_sync+0x190/0x198

The root cause is a potential deadlock between the following tasks:

kworker/u8:11                Thread A
- f2fs_write_single_data_page
  - f2fs_do_write_data_page
   - folio_start_writeback(X)
   - f2fs_outplace_write_data
    - bio_add_folio(X)
  - folio_unlock(X)
                    - truncate_inode_pages_range
                     - __filemap_get_folio(X, FGP_LOCK)
                     - truncate_inode_partial_folio(X)
                      - folio_wait_writeback(X)
  - f2fs_balance_fs
   - f2fs_gc
    - do_garbage_collect
     - move_data_page
      - f2fs_get_lock_data_page
       - __filemap_get_folio(X, FGP_LOCK)

Both threads try to access folio X. Thread A holds the lock but waits
for writeback, while kworker waits for the lock. This causes a deadlock.

Other threads also enter D state, waiting for locks such as gc_lock and
writepages.

OPU/IPU DATA folio are all affected by this issue. To avoid such
potential deadlocks, always commit these cached folios before
triggering f2fs_gc() in f2fs_balance_fs().

v2:
- Commit cached OPU/IPU folios, not just OPU folios as in v1.

Suggested-by: Chao <chao@xxxxxxxxxx>
Signed-off-by: Ruipeng Qi <ruipengqi3@xxxxxxxxx>
---
  fs/f2fs/data.c    | 26 ++++++++++++++++++++++++++
  fs/f2fs/f2fs.h    |  1 +
  fs/f2fs/segment.c |  9 +++++++++
  3 files changed, 36 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 338df7a2aea6..fd03366b3228 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -939,6 +939,32 @@ void f2fs_submit_merged_ipu_write(struct f2fs_sb_info *sbi,
      }
  }
  +void f2fs_submit_all_merged_ipu_writes(struct f2fs_sb_info *sbi)
+{
+    struct bio_entry *be, *tmp;
+    struct f2fs_bio_info *io;
+    enum temp_type temp;
+    LIST_HEAD(list);
+
+    for (temp = HOT; temp < NR_TEMP_TYPE; temp++) {
+        io = sbi->write_io[DATA] + temp;
+
+        if (list_empty(&io->bio_list))
+            continue;

Needs to be covered w/ bio_list_lock to avoid race condition.

Hi,Chao

The lockless list_empty() here is intentional and acceptable.


If list_empty() returns true but the list becomes non-empty
afterwards (due to race), the newly added bio will be submitted
by the subsequent write path, so no bio will be lost.

Ah, right, we only need to submit the folios cached by local thread.



Similar patterns exist in the kernel, e.g.:
   net/rfkill/core.c: rfkill_fop_read()
     /* since we re-check and it just compares pointers,
      * using !list_empty() without locking isn't a problem
      */
   fs/f2fs/data.c: f2fs_submit_merged_ipu_write()
     list_empty() is also used without holding bio_list_lock
     as a lockless pre-check


If you'd prefer, we can add a comment to make the intent clear:

     /* list_empty() without lock is safe here - READ_ONCE()
      * ensures pointer read atomicity. A false negative is
      * acceptable since any bio added concurrently will be
      * submitted by the next write path.
      */
     if (list_empty(&io->bio_list))
         continue;

+
+        f2fs_down_write(&io->bio_list_lock);
+        list_splice_init(&io->bio_list, &list);
+        f2fs_up_write(&io->bio_list_lock);
+
+        list_for_each_entry_safe(be, tmp, &list, list) {
+            f2fs_submit_write_bio(sbi, be->bio, DATA);
+            del_bio_entry(be);
+        }
+

Unnecessary blank line.

Thanks,

Thanks for your correction. Will fix in v3.
     v3:
     - Fixed minor grammatical issues
     - Add comment on lockless list_empty() to explain why it is safe
   without holding bio_list_lock

Seems fine.



Thanks,


+    }
+
+}
+
  int f2fs_merge_page_bio(struct f2fs_io_info *fio)
  {
      struct bio *bio = *fio->bio;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index bb34e864d0ef..e9038ab1b2bd 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4148,6 +4148,7 @@ void f2fs_submit_merged_write_folio(struct f2fs_sb_info *sbi,
                  struct folio *folio, enum page_type type);
  void f2fs_submit_merged_ipu_write(struct f2fs_sb_info *sbi,
                      struct bio **bio, struct folio *folio);
+void f2fs_submit_all_merged_ipu_writes(struct f2fs_sb_info *sbi);
  void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi);
  int f2fs_submit_page_bio(struct f2fs_io_info *fio);
  int f2fs_merge_page_bio(struct f2fs_io_info *fio);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 6a97fe76712b..856ffe91b94f 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -454,6 +454,15 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
          io_schedule();
          finish_wait(&sbi->gc_thread->fggc_wq, &wait);
      } else {
+
+        /*
+         * Submit all cached OPU/IPU DATA bios before triggering
+         * foreground GC to avoid potential deadlocks.
+         */
+
+        f2fs_submit_merged_write(sbi, DATA);
+        f2fs_submit_all_merged_ipu_writes(sbi);

Can we relocate above code to below the variable definitions?

Thanks,

Hi, Chao

Sure, will fix it in V3.

BTW, To avoid potential deadlocks, this patch submits cached OPU/IPU folios
before triggering f2fs_gc() in f2fs_balance_fs(), which changes the
existing IPU/OPU BIO lifecycle.

For OPU, io->io_rwsem provides the necessary synchronization.
For IPU, io->bio_list_lock ensures race-free submission.
In both cases, new BIOs will be allocated as needed after submission.

I may have missed something in the current implementation.
Your professional review would be much appreciated.

Thanks,

+
          struct f2fs_gc_control gc_control = {
              .victim_segno = NULL_SEGNO,
              .init_gc_type = f2fs_sb_has_blkzoned(sbi) ?