Re: [PATCH RFC] signalfd: add support for SFD_TASK

From: Jann Horn
Date: Fri Nov 29 2019 - 17:31:07 EST


On Thu, Nov 28, 2019 at 8:18 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> On Thu, Nov 28, 2019 at 11:07 AM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > On Thu, Nov 28, 2019 at 10:02 AM Rasmus Villemoes
> > <linux@xxxxxxxxxxxxxxxxxx> wrote:
> > > On 28/11/2019 00.27, Jann Horn wrote:
> > >
> > > > One more thing, though: We'll have to figure out some way to
> > > > invalidate the fd when the target goes through execve(), in particular
> > > > if it's a setuid execution. Otherwise we'll be able to just steal
> > > > signals that were intended for the other task, that's probably not
> > > > good.
> > > >
> > > > So we should:
> > > > a) prevent using ->wait() on an old signalfd once the task has gone
> > > > through execve()
> > > > b) kick off all existing waiters
> > > > c) most importantly, prevent ->read() on an old signalfd once the
> > > > task has gone through execve()
> > > >
> > > > We probably want to avoid using the cred_guard_mutex here, since it is
> > > > quite broad and has some deadlocking issues; it might make sense to
> > > > put the update of ->self_exec_id in fs/exec.c under something like the
> > > > siglock,
> > >
> > > What prevents one from exec'ing a trivial helper 2^32-1 times before
> > > exec'ing into the victim binary?
> >
> > Uh, yeah... that thing should probably become 64 bits wide, too.
>
> Actually, that'd still be wrong even with the existing kernel code for
> two reasons:
>
> - if you reparent to a subreaper, the existing exec_id comparison breaks
> - the new check here is going to break if a non-leader thread goes
> through execve(), because of the weird magic where the thread going
> through execve steals the thread id (PID) of the leader
>
> I'm gone for the day, but will try to dust off the years-old patch for
> this that I have lying around somewhere tomorrow. I should probably
> send it through akpm's tree with cc stable, given that this is already
> kinda broken in existing releases...

I'm taking that back, given that I was wrong when writing this mail.
But I've attached the old patch, in case you want to reuse it. That
cpu-plus-64-bits scheme was Andy Lutomirski's idea.

If you use that, you'd have to take the cred_guard_mutex for ->poll
and ->read, but I guess that's probably fine for signalfd.
From a6fcfcc15dacaeb4cc120df447a719da3b9e0c9d Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh@xxxxxxxxxx>
Date: Fri, 29 Nov 2019 23:10:33 +0100
Subject: [PATCH] exec: generate unique per-execution, per-process identifiers

This adds a member privunit ("privilege unit") to task_struct.
privunit is only shared by threads and changes on execve().
It can be used to check whether two tasks are temporally and spatially
equal for privilege checking purposes.

The implementation of locally unique IDs is in sched.h and exec.c for now
because those are the only users so far - if anything else wants to use
them in the future, they can be moved elsewhere.

Signed-off-by: Jann Horn <jannh@xxxxxxxxxx>
---
fs/exec.c | 17 +++++++++++++++++
include/linux/sched.h | 14 ++++++++++++++
kernel/fork.c | 1 +
3 files changed, 32 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index c27231234764..4cea8acb95e5 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1331,6 +1331,22 @@ void would_dump(struct linux_binprm *bprm, struct file *file)
}
EXPORT_SYMBOL(would_dump);

+/* value 0 is reserved for init */
+static DEFINE_PER_CPU(u64, luid_counters) = 1;
+
+/*
+ * Allocates a new LUID and writes the allocated LUID to @out.
+ * This function must not be called from IRQ context.
+ */
+void alloc_luid(struct luid *out)
+{
+ preempt_disable();
+ out->count = raw_cpu_read(luid_counters);
+ raw_cpu_add(luid_counters, 1);
+ out->cpu = smp_processor_id();
+ preempt_enable();
+}
+
void setup_new_exec(struct linux_binprm * bprm)
{
/*
@@ -1384,6 +1400,7 @@ void setup_new_exec(struct linux_binprm * bprm)
/* An exec changes our domain. We are no longer part of the thread
group */
current->self_exec_id++;
+ alloc_luid(&current->privunit);
flush_signal_handlers(current, 0);
}
EXPORT_SYMBOL(setup_new_exec);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 07e68d9f5dc4..6a3c16b3b43d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -626,6 +626,19 @@ struct wake_q_node {
struct wake_q_node *next;
};

+/* locally unique ID */
+struct luid {
+ u64 count;
+ unsigned int cpu;
+};
+
+void alloc_luid(struct luid *out);
+
+static inline bool luid_eq(const struct luid *a, const struct luid *b)
+{
+ return a->count == b->count && a->cpu == b->cpu;
+}
+
struct task_struct {
#ifdef CONFIG_THREAD_INFO_IN_TASK
/*
@@ -941,6 +954,7 @@ struct task_struct {
/* Thread group tracking: */
u32 parent_exec_id;
u32 self_exec_id;
+ struct luid privunit;

/* Protection against (de-)allocation: mm, files, fs, tty, keyrings, mems_allowed, mempolicy: */
spinlock_t alloc_lock;
diff --git a/kernel/fork.c b/kernel/fork.c
index 00b64f41c2b4..75784ff9c9f2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2152,6 +2152,7 @@ static __latent_entropy struct task_struct *copy_process(
p->exit_signal = args->exit_signal;
p->group_leader = p;
p->tgid = p->pid;
+ alloc_luid(&p->privunit);
}

p->nr_dirtied = 0;
--
2.24.0.393.g34dc348eaf-goog