[PATCH] exec: ensure file system accounting in check_unsafe_exec is correct

From: Colin King
Date: Wed May 10 2017 - 07:05:34 EST


From: Colin Ian King <colin.king@xxxxxxxxxxxxx>

There are two very race windows that need to be taken into consideration
when check_unsafe_exec performs the file system accounting comparing the
number of fs->users against the number threads that share the same fs.

The first race can occur when a pthread creates a new pthread and the
the fs->users count is incremented before the new pthread is associated
with the pthread performing the exec. When this occurs, the pthread
performing the exec has flags with bit PF_FORKNOEXEC set.

The second race can occur when a pthread is terminating and the fs->users
count has been decremented by the pthread is still associated with the
pthread that is performing the exec. When this occurs, the pthread
peforming the exec has flags with bit PF_EXITING set.

This fix keeps track of any pthreads that may be in the race window
(when PF_FORKNOEXEC or PF_EXITING) are set and if the fs count does
not match the expected count we retry the count as we may have hit
this small race windows. Tests on an 8 thread server with the
reproducer (see below) show that this retry occurs rarely, so the
overhead of the retry is very small.

Below is a reproducer of the race condition.

The bug manifests itself because the current check_unsafe_exec
hits this race and indicates it is not a safe exec, and the
exec'd suid program fails to setuid.

$ cat Makefile
ALL=a b
all: $(ALL)

a: LDFLAGS=-pthread

b: b.c
$(CC) b.c -o b
sudo chown root:root b
sudo chmod u+s b

test:
for I in $$(seq 1000); do echo $I; ./a ; done

clean:
rm -vf $(ALL)

$ cat a.c

void *nothing(void *p)
{
return NULL;
}

void *target(void *p) {
for (;;) {
pthread_t t;
if (pthread_create(&t, NULL, nothing, NULL) == 0)
pthread_join(t, NULL);
}
return NULL;
}

int main(void)
{
struct timespec tv;
int i;

for (i = 0; i < 10; i++) {
pthread_t t;
pthread_create(&t, NULL, target, NULL);
}
tv.tv_sec = 0;
tv.tv_nsec = 100000;
nanosleep(&tv, NULL);
if (execl("./b", "./b", NULL) < 0)
perror("execl");
return 0;
}

$ cat b.c

int main(void)
{
const uid_t euid = geteuid();
if (euid != 0) {
printf("Failed, got euid %d (expecting 0)\n", euid);
return 1;
}
return 0;
}

$ make
make
cc -pthread a.c -o a
cc b.c -o b
sudo chown root:root b
sudo chmod u+s b
$ for i in $(seq 1000); do ./a; done

Without the fix, one will see 'Failed, got euid xxxx (expecting 0)'
messages where xxxx is one's uid.

BugLink: http://bugs.launchpad.net/bugs/1672819
Signed-off-by: Colin Ian King <colin.king@xxxxxxxxxxxxx>
---
fs/exec.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 72934df..5117dc4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1431,6 +1431,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
{
struct task_struct *p = current, *t;
unsigned n_fs;
+ bool fs_recheck;

if (p->ptrace)
bprm->unsafe |= LSM_UNSAFE_PTRACE;
@@ -1442,6 +1443,8 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
if (task_no_new_privs(current))
bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS;

+recheck:
+ fs_recheck = false;
t = p;
n_fs = 1;
spin_lock(&p->fs->lock);
@@ -1449,12 +1452,18 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
while_each_thread(p, t) {
if (t->fs == p->fs)
n_fs++;
+ if (t->flags & (PF_EXITING | PF_FORKNOEXEC))
+ fs_recheck = true;
}
rcu_read_unlock();

- if (p->fs->users > n_fs)
+ if (p->fs->users > n_fs) {
+ if (fs_recheck) {
+ spin_unlock(&p->fs->lock);
+ goto recheck;
+ }
bprm->unsafe |= LSM_UNSAFE_SHARE;
- else
+ } else
p->fs->in_exec = 1;
spin_unlock(&p->fs->lock);
}
--
2.7.4