Re: fs: uninterruptible hang in handle_userfault

From: Linus Torvalds
Date: Tue Mar 01 2016 - 14:56:30 EST


On Tue, Mar 1, 2016 at 3:29 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>
> The following program creates an unkillable process in D state:

It seems to be usefaultfd that *tries* to handle signals, but there's
one special fault case where signals won't make it through: when we're
exiting and doing the final child pid clearing access.

We could do this two ways:

(a) special-case the PF_EXITING case for usefaultfd, something like

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 50311703135b..66cdb44616d5 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -287,6 +287,12 @@ int handle_userfault(struct vm_area_struct
*vma, unsigned long address,
goto out;

/*
+ * We don't do userfault handling for the final child pid update.
+ */
+ if (current->flags & PF_EXITING)
+ goto out;
+
+ /*
* Check that we can return VM_FAULT_RETRY.
*
* NOTE: it should become possible to return VM_FAULT_RETRY

or (b) always consider the exiting case be "fatal signal pending"

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a10494a94cc3..5adf9f001df3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2924,7 +2924,7 @@ static inline int
__fatal_signal_pending(struct task_struct *p)

static inline int fatal_signal_pending(struct task_struct *p)
{
- return signal_pending(p) && __fatal_signal_pending(p);
+ return (p->flags & PF_EXITING) || (signal_pending(p) &&
__fatal_signal_pending(p));
}

static inline int signal_pending_state(long state, struct task_struct *p)

either of which feels a bit hacky to me.

That general "consider the final exit always as if we have a fatal
signal pending" feels like a more generic fix, but it makes me think
that it will fail on NFS-backed mmap's too. That could be seen as a
good thing (avoiding hangs when the NFS server dies), but it also
means that the patch clearly changes *other* semantics too, not just
the usefaultfd case.

So (a) is more targeted, and might be safer.

Does anybody have any other suggestions?

(The above patches are entirely untested, maybe I misread the reason
it might be hanging and it's something else going on).

Linus