Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
From: Alexei Starovoitov
Date: Fri May 04 2018 - 21:37:25 EST
On Fri, May 04, 2018 at 07:56:43PM +0000, Luis R. Rodriguez wrote:
> What a mighty short list of reviewers. Adding some more. My review below.
> I'd appreciate a Cc on future versions of these patches.
sure.
> On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
> > Introduce helper:
> > int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> > struct umh_info {
> > struct file *pipe_to_umh;
> > struct file *pipe_from_umh;
> > pid_t pid;
> > };
> >
> > that GPLed kernel modules (signed or unsigned) can use it to execute part
> > of its own data as swappable user mode process.
> >
> > The kernel will do:
> > - mount "tmpfs"
>
> Actually its a *shared* vfsmount tmpfs for all umh blobs.
yep
> > - allocate a unique file in tmpfs
> > - populate that file with [data, data + len] bytes
> > - user-mode-helper code will do_execve that file and, before the process
> > starts, the kernel will create two unix pipes for bidirectional
> > communication between kernel module and umh
> > - close tmpfs file, effectively deleting it
> > - the fork_usermode_blob will return zero on success and populate
> > 'struct umh_info' with two unix pipes and the pid of the user process
>
> But since its using UMH_WAIT_EXEC, all we can guarantee currently is the
> inception point was intended, well though out, and will run, but the return
> value in no way reflects the success or not of the execution. More below.
yep
> > As the first step in the development of the bpfilter project
> > the fork_usermode_blob() helper is introduced to allow user mode code
> > to be invoked from a kernel module. The idea is that user mode code plus
> > normal kernel module code are built as part of the kernel build
> > and installed as traditional kernel module into distro specified location,
> > such that from a distribution point of view, there is
> > no difference between regular kernel modules and kernel modules + umh code.
> > Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
> > by a kernel module doesn't make it any special from kernel and user space
> > tooling point of view.
> >
> > Such approach enables kernel to delegate functionality traditionally done
> > by the kernel modules into the user space processes (either root or !root) and
> > reduces security attack surface of the new code. The buggy umh code would crash
> > the user process, but not the kernel. Another advantage is that umh code
> > of the kernel module can be debugged and tested out of user space
> > (e.g. opening the possibility to run clang sanitizers, fuzzers or
> > user space test suites on the umh code).
> > In case of the bpfilter project such architecture allows complex control plane
> > to be done in the user space while bpf based data plane stays in the kernel.
> >
> > Since umh can crash, can be oom-ed by the kernel, killed by the admin,
> > the kernel module that uses them (like bpfilter) needs to manage life
> > time of umh on its own via two unix pipes and the pid of umh.
> >
> > The exit code of such kernel module should kill the umh it started,
> > so that rmmod of the kernel module will cleanup the corresponding umh.
> > Just like if the kernel module does kmalloc() it should kfree() it in the exit code.
> >
> > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> > ---
> > fs/exec.c | 38 ++++++++---
> > include/linux/binfmts.h | 1 +
> > include/linux/umh.h | 12 ++++
> > kernel/umh.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++-
> > 4 files changed, 215 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 183059c427b9..30a36c2a39bf 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1706,14 +1706,13 @@ static int exec_binprm(struct linux_binprm *bprm)
> > /*
> > * sys_execve() executes a new program.
> > */
> > -static int do_execveat_common(int fd, struct filename *filename,
> > - struct user_arg_ptr argv,
> > - struct user_arg_ptr envp,
> > - int flags)
> > +static int __do_execve_file(int fd, struct filename *filename,
> > + struct user_arg_ptr argv,
> > + struct user_arg_ptr envp,
> > + int flags, struct file *file)
> > {
> > char *pathbuf = NULL;
> > struct linux_binprm *bprm;
> > - struct file *file;
> > struct files_struct *displaced;
> > int retval;
>
> Keeping in mind a fuzzer...
>
> Note, right below this, and not shown here in the hunk, is:
>
> if (IS_ERR(filename))
> return PTR_ERR(filename)
> >
> > @@ -1752,7 +1751,8 @@ static int do_execveat_common(int fd, struct filename *filename,
> > check_unsafe_exec(bprm);
> > current->in_execve = 1;
> >
> > - file = do_open_execat(fd, filename, flags);
> > + if (!file)
> > + file = do_open_execat(fd, filename, flags);
>
>
> Here we now seem to allow !file and open the file with the passed fd as in
> the old days. This is an expected change.
>
> > retval = PTR_ERR(file);
> > if (IS_ERR(file))
> > goto out_unmark;
> > @@ -1760,7 +1760,9 @@ static int do_execveat_common(int fd, struct filename *filename,
> > sched_exec();
> >
> > bprm->file = file;
> > - if (fd == AT_FDCWD || filename->name[0] == '/') {
> > + if (!filename) {
>
> If anything shouldn't this be:
>
> if (IS_ERR(filename))
nope. it should be !filename as do_execve_file() passes NULL.
IS_ERR != IS_ERR_OR_NULL
> But, wouldn't the above first branch in the routine catch this?
>
> > + bprm->filename = "none";
>
> Given this seems like a desirable branch which was tested, wonder how this
> ever got set if the above branch in the first hunk I noted hit true?
>
> In any case, we seem to have two cases, can we rule out the exact requirements
> at the top so we can bail out with an error code if one or the other way to
> call this function does not align with expectations?
I think you're misreading the code or I don't understand the concern at all.
> > + } else if (fd == AT_FDCWD || filename->name[0] == '/') {
> > bprm->filename = filename->name;
> > } else {
> > if (filename->name[0] == '\0')
> > @@ -1826,7 +1828,8 @@ static int do_execveat_common(int fd, struct filename *filename,
> > task_numa_free(current);
> > free_bprm(bprm);
> > kfree(pathbuf);
> > - putname(filename);
> > + if (filename)
> > + putname(filename);
> > if (displaced)
> > put_files_struct(displaced);
> > return retval;
> > @@ -1849,10 +1852,27 @@ static int do_execveat_common(int fd, struct filename *filename,
> > if (displaced)
> > reset_files_struct(displaced);
> > out_ret:
> > - putname(filename);
> > + if (filename)
> > + putname(filename);
> > return retval;
> > }
> >
> > +static int do_execveat_common(int fd, struct filename *filename,
>
> Further signs the filename is now optional. But I don't understand how these
> branches ever be true, but perhaps I'm missing something?
>
> > + struct user_arg_ptr argv,
> > + struct user_arg_ptr envp,
> > + int flags)
> > +{
> > + return __do_execve_file(fd, filename, argv, envp, flags, NULL);
> > +}
> > +
> > +int do_execve_file(struct file *file, void *__argv, void *__envp)
> > +{
> > + struct user_arg_ptr argv = { .ptr.native = __argv };
> > + struct user_arg_ptr envp = { .ptr.native = __envp };
> > +
> > + return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file);
> > +}
>
> Or maybe do the semantics expectations checks here, so we don't clutter
> do_execveat_common() with them?
specifically ?
> > +
> > int do_execve(struct filename *filename,
> > const char __user *const __user *__argv,
> > const char __user *const __user *__envp)
> > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > index 4955e0863b83..c05f24fac4f6 100644
> > --- a/include/linux/binfmts.h
> > +++ b/include/linux/binfmts.h
> > @@ -150,5 +150,6 @@ extern int do_execveat(int, struct filename *,
> > const char __user * const __user *,
> > const char __user * const __user *,
> > int);
> > +int do_execve_file(struct file *file, void *__argv, void *__envp);
> >
> > #endif /* _LINUX_BINFMTS_H */
> > diff --git a/include/linux/umh.h b/include/linux/umh.h
> > index 244aff638220..5c812acbb80a 100644
> > --- a/include/linux/umh.h
> > +++ b/include/linux/umh.h
> > @@ -22,8 +22,10 @@ struct subprocess_info {
> > const char *path;
> > char **argv;
> > char **envp;
> > + struct file *file;
> > int wait;
> > int retval;
> > + pid_t pid;
> > int (*init)(struct subprocess_info *info, struct cred *new);
> > void (*cleanup)(struct subprocess_info *info);
> > void *data;
>
> While at it, can you kdocify struct subprocess_info and add new docs for at
> least these two entires you are adding ?
Sorry 'while at it' doesn't sound as a good reason to
add kdoc now instead of later.
> > @@ -38,6 +40,16 @@ call_usermodehelper_setup(const char *path, char **argv, char **envp,
> > int (*init)(struct subprocess_info *info, struct cred *new),
> > void (*cleanup)(struct subprocess_info *), void *data);
> >
> > +struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
> > + int (*init)(struct subprocess_info *info, struct cred *new),
> > + void (*cleanup)(struct subprocess_info *), void *data);
>
> Likewise but on the umc.c file.
>
> > +struct umh_info {
> > + struct file *pipe_to_umh;
> > + struct file *pipe_from_umh;
> > + pid_t pid;
> > +};
>
> Likewise.
what 'likewise' ? The kdoc ?
>
> > +int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
>
> Likewise but on the umc.c files.
>
> > +
> > extern int
> > call_usermodehelper_exec(struct subprocess_info *info, int wait);
> >
> > diff --git a/kernel/umh.c b/kernel/umh.c
> > index f76b3ff876cf..c3f418d7d51a 100644
> > --- a/kernel/umh.c
> > +++ b/kernel/umh.c
> > @@ -25,6 +25,8 @@
> > #include <linux/ptrace.h>
> > #include <linux/async.h>
> > #include <linux/uaccess.h>
> > +#include <linux/shmem_fs.h>
> > +#include <linux/pipe_fs_i.h>
> >
> > #include <trace/events/module.h>
> >
> > @@ -97,9 +99,13 @@ static int call_usermodehelper_exec_async(void *data)
> >
> > commit_creds(new);
> >
> > - retval = do_execve(getname_kernel(sub_info->path),
> > - (const char __user *const __user *)sub_info->argv,
> > - (const char __user *const __user *)sub_info->envp);
> > + if (sub_info->file)
> > + retval = do_execve_file(sub_info->file,
> > + sub_info->argv, sub_info->envp);
> > + else
> > + retval = do_execve(getname_kernel(sub_info->path),
> > + (const char __user *const __user *)sub_info->argv,
> > + (const char __user *const __user *)sub_info->envp);
> > out:
> > sub_info->retval = retval;
> > /*
> > @@ -185,6 +191,8 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
> > if (pid < 0) {
> > sub_info->retval = pid;
> > umh_complete(sub_info);
> > + } else {
> > + sub_info->pid = pid;
> > }
> > }
> > }
> > @@ -393,6 +401,168 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
> > }
> > EXPORT_SYMBOL(call_usermodehelper_setup);
> >
> > +struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
> > + int (*init)(struct subprocess_info *info, struct cred *new),
> > + void (*cleanup)(struct subprocess_info *info), void *data)
>
> Should be static, no other users outside of this file.
good catch. will change to static.
> Please use umh_setup_file().
sorry. makes no sense.
There is call_usermodehelper_setup() right above it.
call_usermodehelper_setup_file() just follows the naming convention.
If you prefer shorter names, both have to be renamed in the separate patch series.
> > +{
> > + struct subprocess_info *sub_info;
>
> Considering a possible fuzzer triggering random data we should probably
> return NULL early and avoid the kzalloc if:
I missing 'fuzzer' point here and earlier.
'fuzzer' cannot reach here. It's all internal api.
> if (!file || !init || !cleanup)
> return NULL;
sorry, nope. in kernel we don't do defensive programming like this.
> Is data optional? The kdoc could clarify this.
No. Should be obvious from this patch.
The only caller of call_usermodehelper_setup_file() is fork_usermode_blob()
and it passes 'struct umh_info *info'.
>
> > +
> > + sub_info = kzalloc(sizeof(struct subprocess_info), GFP_KERNEL);
> > + if (!sub_info)
> > + return NULL;
> > +
> > + INIT_WORK(&sub_info->work, call_usermodehelper_exec_work);
> > + sub_info->path = "none";
> > + sub_info->file = file;
> > + sub_info->init = init;
> > + sub_info->cleanup = cleanup;
> > + sub_info->data = data;
> > + return sub_info;
> > +}
> > +
> > +static struct vfsmount *umh_fs;
> > +
> > +static int init_tmpfs(void)
>
> Please use umh_init_tmpfs().
ok
> Also see init/main.c do_basic_setup() which calls
> usermodehelper_enable() prior to do_initcalls(). Now, fortunately TMPFS is only
> bool, saving us from some races and we do call tmpfs's init first shmem_init():
>
> static void __init do_basic_setup(void)
> {
> cpuset_init_smp();
> shmem_init();
> driver_init();
> init_irq_proc();
> do_ctors();
> usermodehelper_enable();
> do_initcalls();
> }
>
> But it also means we're enabling your new call call fork_usermode_blob() on
> early init code even if we're not setup. Since this umh tmpfs vfsmount is
> shared I'd say just call this init right before usermodehelper_enable()
> on do_basic_setup().
Not following.
Why init_tmpfs() should be called by __init function?
Are you saying make 'static struct vfsmount *shm_mnt;'
global and use it here? so no init_tmpfs() necessary?
I think that can work, but feels that having two
tmpfs mounts (one for shmem and one for umh) is cleaner.
>
> > +{
> > + struct file_system_type *type;
> > +
> > + if (umh_fs)
> > + return 0;
> > + type = get_fs_type("tmpfs");
> > + if (!type)
> > + return -ENODEV;
> > + umh_fs = kern_mount(type);
> > + if (IS_ERR(umh_fs)) {
> > + int err = PTR_ERR(umh_fs);
> > +
> > + put_filesystem(type);
> > + umh_fs = NULL;
> > + return err;
> > + }
> > + return 0;
> > +}
> > +
> > +static int alloc_tmpfs_file(size_t size, struct file **filp)
>
> Please use umh_alloc_tmpfs_file()
ok
> > +{
> > + struct file *file;
> > + int err;
> > +
> > + err = init_tmpfs();
> > + if (err)
> > + return err;
> > + file = shmem_file_setup_with_mnt(umh_fs, "umh", size, VM_NORESERVE);
> > + if (IS_ERR(file))
> > + return PTR_ERR(file);
> > + *filp = file;
> > + return 0;
> > +}
> > +
> > +static int populate_file(struct file *file, const void *data, size_t size)
>
> Please use umh_populate_file()
ok
> > +{
> > + size_t offset = 0;
> > + int err;
> > +
> > + do {
> > + unsigned int len = min_t(typeof(size), size, PAGE_SIZE);
> > + struct page *page;
> > + void *pgdata, *vaddr;
> > +
> > + err = pagecache_write_begin(file, file->f_mapping, offset, len,
> > + 0, &page, &pgdata);
> > + if (err < 0)
> > + goto fail;
> > +
> > + vaddr = kmap(page);
> > + memcpy(vaddr, data, len);
> > + kunmap(page);
> > +
> > + err = pagecache_write_end(file, file->f_mapping, offset, len,
> > + len, page, pgdata);
> > + if (err < 0)
> > + goto fail;
> > +
> > + size -= len;
> > + data += len;
> > + offset += len;
> > + } while (size);
>
> Character for character, this looks like a wonderful copy and paste from
> i915_gem_object_create_from_data()'s own loop which does the same exact
> thing. Perhaps its time for a helper on mm/filemap.c with an export so
> if a bug is fixed in one place its fixed in both places.
yes, of course, but not right now.
Once it all lands that will be the time to create common helper.
It's not a good idea to mess with i915 in one patch set.
> > + return 0;
> > +fail:
> > + return err;
> > +}
> > +
> > +static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
>
> The function name umh_pipe_setup() is also used on fs/coredump.c, with the same
> prototype, perhaps rename that before we take this on, even if both are static.
hmm. why?
These are two static functions with the same name, so?
tags get confusing?
> > +{
> > + struct umh_info *umh_info = info->data;
> > + struct file *from_umh[2];
> > + struct file *to_umh[2];
> > + int err;
> > +
> > + /* create pipe to send data to umh */
> > + err = create_pipe_files(to_umh, 0);
> > + if (err)
> > + return err;
> > + err = replace_fd(0, to_umh[0], 0);
> > + fput(to_umh[0]);
> > + if (err < 0) {
> > + fput(to_umh[1]);
> > + return err;
> > + }
> > +
> > + /* create pipe to receive data from umh */
> > + err = create_pipe_files(from_umh, 0);
> > + if (err) {
> > + fput(to_umh[1]);
> > + replace_fd(0, NULL, 0);
> > + return err;
> > + }
> > + err = replace_fd(1, from_umh[1], 0);
> > + fput(from_umh[1]);
> > + if (err < 0) {
> > + fput(to_umh[1]);
> > + replace_fd(0, NULL, 0);
> > + fput(from_umh[0]);
> > + return err;
> > + }
> > +
> > + umh_info->pipe_to_umh = to_umh[1];
> > + umh_info->pipe_from_umh = from_umh[0];
> > + return 0;
> > +}
> > +
> > +static void umh_save_pid(struct subprocess_info *info)
> > +{
> > + struct umh_info *umh_info = info->data;
> > +
> > + umh_info->pid = info->pid;
> > +}
> > +
> > +int fork_usermode_blob(void *data, size_t len, struct umh_info *info)
>
> Please use umh_fork_blob()
sorry, no. fork_usermode_blob() is much more descriptive name.
> > +{
> > + struct subprocess_info *sub_info;
> > + struct file *file = NULL;
> > + int err;
> > +
> > + err = alloc_tmpfs_file(len, &file);
> > + if (err)
> > + return err;
> > +
> > + err = populate_file(file, data, len);
> > + if (err)
> > + goto out;
> > +
> > + err = -ENOMEM;
> > + sub_info = call_usermodehelper_setup_file(file, umh_pipe_setup,
> > + umh_save_pid, info);
> > + if (!sub_info)
> > + goto out;
> > +
> > + err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC);
>
> Alright, neat, so to be clear, we're just glad to try inception, we have no
> clue or idea what the real return value would be, its up to the caller to track
> the progress somehow?
yep.
> Can you add a kdoc entry for this and clarify requirements?
ok. I'll add a comment to this helper.
> Also, can you extend lib/test_kmod.c with a test case for this with its own
> demo and try to blow it up?
in what sense? bpfilter is the test and the driving component for it.
I'm expecting that folks who want to use this helper to do usb drivers
in user space may want to extend this helper further, but that's their job.
> I hadn't tried suspend/resume during a kmod test, but since we're using a
> kernel_thread() I wouldn't be surprised if we barf while stress testing the
> module loader. Its surely a corner case, but better mention that now than cry
> later if we get heavy umh modules and all of a sudden we start using this for
> whatever reason close to suspend.
folks that care about suspend/resume should do that.
I'm happy to gate this helper for !CONFIG_SUSPEND, since I have
no idea what issues can be uncovered, how to fix them and no desire to do so.
Thanks