[PATCH v4 1/2] livepatch: send a fake signal to all blocking tasks

From: Miroslav Benes
Date: Wed Nov 15 2017 - 08:50:41 EST


Live patching consistency model is of LEAVE_PATCHED_SET and
SWITCH_THREAD. This means that all tasks in the system have to be marked
one by one as safe to call a new patched function. Safe means when a
task is not (sleeping) in a set of patched functions. That is, no
patched function is on the task's stack. Another clearly safe place is
the boundary between kernel and userspace. The patching waits for all
tasks to get outside of the patched set or to cross the boundary. The
transition is completed afterwards.

The problem is that a task can block the transition for quite a long
time, if not forever. It could sleep in a set of patched functions, for
example. Luckily we can force the task to leave the set by sending it a
fake signal, that is a signal with no data in signal pending structures
(no handler, no sign of proper signal delivered). Suspend/freezer use
this to freeze the tasks as well. The task gets TIF_SIGPENDING set and
is woken up (if it has been sleeping in the kernel before) or kicked by
rescheduling IPI (if it was running on other CPU). This causes the task
to go to kernel/userspace boundary where the signal would be handled and
the task would be marked as safe in terms of live patching.

There are tasks which are not affected by this technique though. The
fake signal is not sent to kthreads. They should be handled differently.
They can be woken up so they leave the patched set and their
TIF_PATCH_PENDING can be cleared thanks to stack checking.

For the sake of completeness, if the task is in TASK_RUNNING state but
not currently running on some CPU it doesn't get the IPI, but it would
eventually handle the signal anyway. Second, if the task runs in the
kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not
handled on return from the interrupt. It would be handled on return to
the userspace in the future when the fake signal is sent again. Stack
checking deals with these cases in a better way.

If the task was sleeping in a syscall it would be woken by our fake
signal, it would check if TIF_SIGPENDING is set (by calling
signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with
ERESTART* return values are restarted in case of the fake signal (see
do_signal()). EINTR is propagated back to the userspace program. This
could disturb the program, but...

* each process dealing with signals should react accordingly to EINTR
return values.
* syscalls returning EINTR happen to be quite common situation in the
system even if no fake signal is sent.
* freezer sends the fake signal and does not deal with EINTR anyhow.
Thus EINTR values are returned when the system is resumed.

The very safe marking is done in architectures' "entry" on syscall and
interrupt/exception exit paths, and in a stack checking functions of
livepatch. TIF_PATCH_PENDING is cleared and the next
recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also
call klp_update_patch_state() before do_signal(), so that
recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING
immediately and thus prevent a double call of do_signal().

Note that the fake signal is not sent to stopped/traced tasks. Such task
prevents the patching to finish till it continues again (is not traced
anymore).

Last, sending the fake signal is not automatic. It is done only when
admin requests it by writing 1 to signal sysfs attribute in livepatch
sysfs directory.

Signed-off-by: Miroslav Benes <mbenes@xxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: linuxppc-dev@xxxxxxxxxxxxxxxx
Cc: x86@xxxxxxxxxx
---
Documentation/ABI/testing/sysfs-kernel-livepatch | 12 +++++++
Documentation/livepatch/livepatch.txt | 11 +++++--
arch/powerpc/kernel/signal.c | 6 ++--
arch/x86/entry/common.c | 6 ++--
kernel/livepatch/core.c | 30 +++++++++++++++++
kernel/livepatch/transition.c | 41 ++++++++++++++++++++++++
kernel/livepatch/transition.h | 1 +
kernel/signal.c | 4 ++-
8 files changed, 102 insertions(+), 9 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-livepatch b/Documentation/ABI/testing/sysfs-kernel-livepatch
index d5d39748382f..3bb9d5bc1ce3 100644
--- a/Documentation/ABI/testing/sysfs-kernel-livepatch
+++ b/Documentation/ABI/testing/sysfs-kernel-livepatch
@@ -33,6 +33,18 @@ Contact: live-patching@xxxxxxxxxxxxxxx
An attribute which indicates whether the patch is currently in
transition.

+What: /sys/kernel/livepatch/<patch>/signal
+Date: Nov 2017
+KernelVersion: 4.15.0
+Contact: live-patching@xxxxxxxxxxxxxxx
+Description:
+ A writable attribute that allows administrator to affect the
+ course of an existing transition. Writing 1 sends a fake
+ signal to all remaining blocking tasks. The fake signal
+ means that no proper signal is delivered (there is no data in
+ signal pending structures). Tasks are interrupted or woken up,
+ and forced to change their patched state.
+
What: /sys/kernel/livepatch/<patch>/<object>
Date: Nov 2014
KernelVersion: 3.19.0
diff --git a/Documentation/livepatch/livepatch.txt b/Documentation/livepatch/livepatch.txt
index ecdb18104ab0..9bcdef277a36 100644
--- a/Documentation/livepatch/livepatch.txt
+++ b/Documentation/livepatch/livepatch.txt
@@ -176,8 +176,12 @@ If a patch is in transition, this file shows 0 to indicate the task is
unpatched and 1 to indicate it's patched. Otherwise, if no patch is in
transition, it shows -1. Any tasks which are blocking the transition
can be signaled with SIGSTOP and SIGCONT to force them to change their
-patched state.
-
+patched state. This may be harmful to the system though.
+/sys/kernel/livepatch/<patch>/signal attribute provides a better alternative.
+Writing 1 to the attribute sends a fake signal to all remaining blocking
+tasks. No proper signal is actually delivered (there is no data in signal
+pending structures). Tasks are interrupted or woken up, and forced to change
+their patched state.

3.1 Adding consistency model support to new architectures
---------------------------------------------------------
@@ -435,6 +439,9 @@ Information about the registered patches can be found under
/sys/kernel/livepatch. The patches could be enabled and disabled
by writing there.

+/sys/kernel/livepatch/<patch>/signal attribute allows administrator to affect a
+patching operation.
+
See Documentation/ABI/testing/sysfs-kernel-livepatch for more details.


diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index e9436c5e1e09..bf9c4e7792d1 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -153,6 +153,9 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
if (thread_info_flags & _TIF_UPROBE)
uprobe_notify_resume(regs);

+ if (thread_info_flags & _TIF_PATCH_PENDING)
+ klp_update_patch_state(current);
+
if (thread_info_flags & _TIF_SIGPENDING) {
BUG_ON(regs != current->thread.regs);
do_signal(current);
@@ -163,9 +166,6 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
tracehook_notify_resume(regs);
}

- if (thread_info_flags & _TIF_PATCH_PENDING)
- klp_update_patch_state(current);
-
user_enter();
}

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 03505ffbe1b6..fb90bdb3602f 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -153,6 +153,9 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
if (cached_flags & _TIF_UPROBE)
uprobe_notify_resume(regs);

+ if (cached_flags & _TIF_PATCH_PENDING)
+ klp_update_patch_state(current);
+
/* deal with pending signal delivery */
if (cached_flags & _TIF_SIGPENDING)
do_signal(regs);
@@ -165,9 +168,6 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
if (cached_flags & _TIF_USER_RETURN_NOTIFY)
fire_user_return_notifiers();

- if (cached_flags & _TIF_PATCH_PENDING)
- klp_update_patch_state(current);
-
/* Disable IRQs and retry */
local_irq_disable();

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index bf8c8fd72589..e772be452462 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -440,6 +440,7 @@ EXPORT_SYMBOL_GPL(klp_enable_patch);
* /sys/kernel/livepatch/<patch>
* /sys/kernel/livepatch/<patch>/enabled
* /sys/kernel/livepatch/<patch>/transition
+ * /sys/kernel/livepatch/<patch>/signal
* /sys/kernel/livepatch/<patch>/<object>
* /sys/kernel/livepatch/<patch>/<object>/<function,sympos>
*/
@@ -514,11 +515,40 @@ static ssize_t transition_show(struct kobject *kobj,
patch == klp_transition_patch);
}

+static ssize_t signal_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct klp_patch *patch;
+ int ret;
+ bool val;
+
+ patch = container_of(kobj, struct klp_patch, kobj);
+
+ /*
+ * klp_mutex lock is not grabbed here intentionally. It is not really
+ * needed. The race window is harmless and grabbing the lock would only
+ * hold the action back.
+ */
+ if (patch != klp_transition_patch)
+ return -EINVAL;
+
+ ret = kstrtobool(buf, &val);
+ if (ret)
+ return ret;
+
+ if (val)
+ klp_send_signals();
+
+ return count;
+}
+
static struct kobj_attribute enabled_kobj_attr = __ATTR_RW(enabled);
static struct kobj_attribute transition_kobj_attr = __ATTR_RO(transition);
+static struct kobj_attribute signal_kobj_attr = __ATTR_WO(signal);
static struct attribute *klp_patch_attrs[] = {
&enabled_kobj_attr.attr,
&transition_kobj_attr.attr,
+ &signal_kobj_attr.attr,
NULL
};

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index b004a1fb6032..095aa8d864f5 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -577,3 +577,44 @@ void klp_copy_process(struct task_struct *child)

/* TIF_PATCH_PENDING gets copied in setup_thread_stack() */
}
+
+/*
+ * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
+ * Kthreads with TIF_PATCH_PENDING set are woken up. Only admin can request this
+ * action currently.
+ */
+void klp_send_signals(void)
+{
+ struct task_struct *g, *task;
+
+ pr_notice("signaling remaining tasks\n");
+
+ read_lock(&tasklist_lock);
+ for_each_process_thread(g, task) {
+ if (!klp_patch_pending(task))
+ continue;
+
+ /*
+ * There is a small race here. We could see TIF_PATCH_PENDING
+ * set and decide to wake up a kthread or send a fake signal.
+ * Meanwhile the task could migrate itself and the action
+ * would be meaningless. It is not serious though.
+ */
+ if (task->flags & PF_KTHREAD) {
+ /*
+ * Wake up a kthread which sleeps interruptedly and
+ * still has not been migrated.
+ */
+ wake_up_state(task, TASK_INTERRUPTIBLE);
+ } else {
+ /*
+ * Send fake signal to all non-kthread tasks which are
+ * still not migrated.
+ */
+ spin_lock_irq(&task->sighand->siglock);
+ signal_wake_up(task, 0);
+ spin_unlock_irq(&task->sighand->siglock);
+ }
+ }
+ read_unlock(&tasklist_lock);
+}
diff --git a/kernel/livepatch/transition.h b/kernel/livepatch/transition.h
index ce09b326546c..47c335f7c04c 100644
--- a/kernel/livepatch/transition.h
+++ b/kernel/livepatch/transition.h
@@ -10,5 +10,6 @@ void klp_cancel_transition(void);
void klp_start_transition(void);
void klp_try_complete_transition(void);
void klp_reverse_transition(void);
+void klp_send_signals(void);

#endif /* _LIVEPATCH_TRANSITION_H */
diff --git a/kernel/signal.c b/kernel/signal.c
index 800a18f77732..9009b8e77bc3 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -40,6 +40,7 @@
#include <linux/cn_proc.h>
#include <linux/compiler.h>
#include <linux/posix-timers.h>
+#include <linux/livepatch.h>

#define CREATE_TRACE_POINTS
#include <trace/events/signal.h>
@@ -163,7 +164,8 @@ void recalc_sigpending_and_wake(struct task_struct *t)

void recalc_sigpending(void)
{
- if (!recalc_sigpending_tsk(current) && !freezing(current))
+ if (!recalc_sigpending_tsk(current) && !freezing(current) &&
+ !klp_patch_pending(current))
clear_thread_flag(TIF_SIGPENDING);

}
--
2.15.0