Re: [PATCH] exec: Check stack space more strictly
From: Kees Cook
Date: Mon Jul 17 2017 - 18:51:30 EST
On Mon, Jul 17, 2017 at 3:22 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> We can currently blow past the stack rlimit and cause odd behavior
> if there are accounting bugs, rounding issues, or races. It's not
> clear that the odd behavior is actually a problem, but it's nicer to
> fail the exec instead of getting out of sync with stack limits.
>
> Improve the code to more carefully check for space and to abort if
> our stack mm gets too large in setup_arg_pages().
>
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
This is a nice check to have in case binfmts do insane things to the
vma before setup_arg_pages(). However, we've still got
create_elf_tables() which is a fixed size _except_ that it adds all
the argv/envp pointers. Those are being accounted for in the
get_arg_page() calls from copy_strings(), though.
Does anything use stack between copy_strings() and create_elf_tables()?
> ---
> fs/exec.c | 44 ++++++++++++++++++++++++++++++++++----------
> 1 file changed, 34 insertions(+), 10 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 62175cbcc801..0c60c0495269 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -764,23 +764,47 @@ int setup_arg_pages(struct linux_binprm *bprm,
> /* mprotect_fixup is overkill to remove the temporary stack flags */
> vma->vm_flags &= ~VM_STACK_INCOMPLETE_SETUP;
>
> - stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */
> - stack_size = vma->vm_end - vma->vm_start;
> /*
> * Align this down to a page boundary as expand_stack
> * will align it up.
> */
> rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
> + stack_size = vma->vm_end - vma->vm_start;
> +
> + if (stack_size > rlim_stack) {
> + /*
> + * If we've already used too much space (due to accounting
> + * bugs, alignment, races, or any other cause), bail.
> + */
> + ret = -ENOMEM;
> + goto out_unlock;
> + }
> +
> + /*
> + * stack_expand is the amount of space beyond the space already used
> + * that we're going to pre-allocate in our stack. For historical
> + * reasons, it's 128kB, unless we have less space than that available
> + * in our rlimit.
> + *
> + * This particular historical wart is wrong-headed, though, since
> + * we haven't finished binfmt-specific setup, and the binfmt code
> + * is going to eat up some or all of this space.
> + */
> + stack_expand = min(rlim_stack - stack_size, 131072UL);
Yeah, this needs to account for argv/envp pointers, IMO. But it's
still a magic number not tied to the actual results of
create_elf_tables() (which is effectively fixed in stack usage except
for argv/envp pointers).
I think create_elf_tables() needs checking added, or the magic number
should be tied to the size of the AUX strings (platform,
base_platform, rand_bytes), and elf_info words.
Hmm, does anything use the stack _after_ create_elf_tables()? Could we
move create_elf_tables() right after setup_arg_pages() and then
perform the check?
> +
> #ifdef CONFIG_STACK_GROWSUP
> - if (stack_size + stack_expand > rlim_stack)
> - stack_base = vma->vm_start + rlim_stack;
> - else
> - stack_base = vma->vm_end + stack_expand;
> + if (TASK_SIZE_MAX - vma->vm_end < stack_expand) {
> + ret = -ENOMEM;
> + goto out_unlock;
> + }
> + stack_base = vma->vm_end + stack_expand;
> #else
> - if (stack_size + stack_expand > rlim_stack)
> - stack_base = vma->vm_end - rlim_stack;
> - else
> - stack_base = vma->vm_start - stack_expand;
> + if (vma->vm_start < mmap_min_addr ||
> + vma->vm_start - mmap_min_addr < stack_expand) {
> + ret = -ENOMEM;
> + goto out_unlock;
> + }
> + stack_base = vma->vm_start - stack_expand;
> #endif
> current->mm->start_stack = bprm->p;
> ret = expand_stack(vma, stack_base);
> --
> 2.9.4
>
FWIW, here are my current notes on the whole existing flow (though
note my other series which is trying to move the secureexec callback
earlier):
do_execvat_common:
check RLIMIT_NPROC
unshare_files() (restored on failure)
allocate bprm
prepare_bprm_creds() (allocate bprm->cred)
mutex_lock_interruptible(¤t->signal->cred_guard_mutex)
prepare_exec_creds()
prepare_creds()
memcpy(new, old, ...)
security_prepare_creds() (unimplemented by commoncap)
check_unsafe_exec()
do_open_execat()
sched_exec()
if closed fd, set BINPRM_FLAGS_PATH_INACCESSIBLE
bprm_mm_init() (allocate bprm->mm)
prepare_binprm()
bprm_fill_uid() (fill bprm->cred)
resets euid/egid to current euid/egid
sets euid/egid on bprm based on set*id
security_bprm_set_creds() (handles all euid etc logic)
kernel_read(bprm->file, ...)
copy_strings() (argv)
get_arg_page() (could fail with E2BIG for args size)
copy_strings() (envp)
get_arg_page() (could fail with E2BIG for args size)
would_dump() (examine inode perms for MAY_READ)
maybe set BINPRM_FLAGS_ENFORCE_NONDUMP
exec_binprm()
search_binary_handler() (may go recursive for script and misc)
security_bprm_check()
for each format
load_binary() (e.g. load_elf_binary)
load_elf_phdrs()
kernel_read(bprm->file, ..., elf_interpreter)
would_dump() (interpreter)
maybe set BINPRM_FLAGS_ENFORCE_NONDUMP
PT_GNU_STACK
arch_check_elf()
flush_old_exec()
de_thread()
exec_mmap()
current->mm = bprm->mm
bprm->mm = NULL; (failures after here will SEGV current)
flush_thread()
current->personality
setup_new_exec()
arch_pick_mmap_layout() (uses current RLIMIT_STACK)
dumpable (using current uid/euid?!)
arch_setup_new_exec()
perf_event_exec()
bprm->cred uid/euid
pdeath_signal = 0
dumpable (based on BINPRM_FLAGS_ENFORCE_NONDUMP)
current->self_exec_id++
flush_signal_handlers()
install_exec_creds()
security_bprm_committing_creds()
commit_creds()
bprm->cred = NULL
if !dumpable
perf_event_exit_task()
security_bprm_committed_creds()
mutex_unlock(¤t->signal->cred_guard_mutex)
setup_arg_pages()
uses RLIMIT_STACK
load ELF image (ET_EXEC vs ET_DYN, load_bias, etc)
set_brk()
load_elf_interp() (if elf_interpreter)
create_elf_tables() (eats fixed stack except argv/envp ptrs)
arch_randomize_brk()
start_thread() (everything is up and running now)
if (retval < 0 && !bprm->mm)
SIGSEGV (can we send signals to my setuid parent?)
audit_bprm()
trace_sched_process_exec()
ptrace_event(PTRACE_EVENT_EXEC)
proc_exec_connector()
clean up
-Kees
--
Kees Cook
Pixel Security