[PATCH] kernfs: support kernfs notify in memory recliam context
From: Junxiao Bi
Date: Tue Nov 14 2023 - 14:00:13 EST
kernfs notify is used in write path of md (md_write_start) to wake up
userspace daemon, like "mdmon" for updating md superblock of imsm raid,
md write will wait for that update done before issuing the write, if this
write is used for memory reclaim, the system may hung due to kernel notify
can't be executed, that's because kernel notify is executed by "system_wq"
which doesn't have a rescuer thread and kworker thread may not be created
due to memory pressure, then userspace daemon can't be woke up and md write
will hung.
According Tejun, this can't be fixed by add RECLAIM to "system_wq" because
that workqueue is shared and someone else might occupy that rescuer thread,
to fix this from md side, have to replace kernfs notify with other way to
communite with userspace daemon, that will break userspace interface,
so use a separated workqueue for kernefs notify to allow it be used in
memory reclaim context.
Link: https://lore.kernel.org/all/a131af22-0a5b-4be1-b77e-8716c63e8883@xxxxxxxxxx/T/
Signed-off-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx>
---
fs/kernfs/file.c | 2 +-
fs/kernfs/kernfs-internal.h | 1 +
fs/kernfs/mount.c | 3 +++
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index f0cb729e9a97..726bfd40a912 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -974,7 +974,7 @@ void kernfs_notify(struct kernfs_node *kn)
kernfs_get(kn);
kn->attr.notify_next = kernfs_notify_list;
kernfs_notify_list = kn;
- schedule_work(&kernfs_notify_work);
+ queue_work(kernfs_wq, &kernfs_notify_work);
}
spin_unlock_irqrestore(&kernfs_notify_lock, flags);
}
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index 237f2764b941..beae5d328342 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -123,6 +123,7 @@ static inline bool kernfs_dir_changed(struct kernfs_node *parent,
extern const struct super_operations kernfs_sops;
extern struct kmem_cache *kernfs_node_cache, *kernfs_iattrs_cache;
+extern struct workqueue_struct *kernfs_wq;
/*
* inode.c
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 4628edde2e7e..7346ec49a621 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -24,6 +24,7 @@
struct kmem_cache *kernfs_node_cache __ro_after_init;
struct kmem_cache *kernfs_iattrs_cache __ro_after_init;
struct kernfs_global_locks *kernfs_locks __ro_after_init;
+struct workqueue_struct *kernfs_wq __ro_after_init;
static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry)
{
@@ -432,4 +433,6 @@ void __init kernfs_init(void)
0, SLAB_PANIC, NULL);
kernfs_lock_init();
+
+ kernfs_wq = alloc_workqueue("kernfs", WQ_MEM_RECLAIM, 0);
}
--
2.39.3 (Apple Git-145)