Hello,
On Tue, Nov 14, 2023 at 12:09:19PM -0800, junxiao.bi@xxxxxxxxxx wrote:
On 11/14/23 11:06 AM, Tejun Heo wrote:I see. That sounds very fragile and I'm not quite sure that can ever be made
On Tue, Nov 14, 2023 at 10:59:47AM -0800, Junxiao Bi wrote:For imsm raid, userspace daemon "mdmon" is responsible for updating raid
kernfs notify is used in write path of md (md_write_start) to wake upHow is forward progress guarnateed for that userspace daemon? This sounds
userspace daemon, like "mdmon" for updating md superblock of imsm raid,
md write will wait for that update done before issuing the write, if this
like a really fragile setup.
metadata, kernel will use kernfs_notify to wake up the daemon anywhere
metadata update is required. If the daemon can't move forward, write may
hung, but that will be a bug in the daemon?
to work reliably. While memlocking everything needed and being really
judicious about which syscalls to make will probably get you pretty far,
there are things like task_work which gets scheduled and executed when a
task is about to return to userspace. Those things are allowed to grab e.g.
mutexes in the kernel and allocate memory and so on, and can deadlock under
memory pressure.
Even just looking at kernfs_notify, it down_reads()
root->kernfs_supers_rwsem which might already be write-locked by somebody
who's waiting on memory allocation.
The patch you're proposing removes one link in the dependency chain but
there are many on there. I'm not sure this is fixable. Nobody writes kernel
code thinking that userspace code can be on the memory reclaim dependency
chain.
Thanks.