Re: [RFC][PATCH 1/2] fs proc: make pagemap a privileged interface
From: Kees Cook
Date: Mon Mar 09 2015 - 17:31:56 EST
On Mon, Mar 9, 2015 at 1:43 PM, Dave Hansen <dave@xxxxxxxx> wrote:
>
> From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>
> Physical addresses are sensitive information. There are
> existing, known exploits that are made easier if physical
> information is available. Here is one example:
>
> http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf
>
> If you know the physical address of something you also know at
> which kernel virtual address you can find something (modulo
> highmem). It means that things that keep the kernel from
> accessing user mappings (like SMAP/SMEP) can be worked around
> because the _kernel_ mapping can get used instead.
>
> But, /proc/$pid/pagemap exposes the physical addresses of all
> pages accessible to userspace. This works against all of the
> efforts to keep kernel addresses out of places where unprivileged
> apps can find them.
>
> This patch introduces a "paranoid" option for /proc. It can be
> enabled like this:
>
> mount -o remount,paranoid /proc
>
> Or when /proc is mounted initially. When 'paranoid' mode is
> active, opens to /proc/$pid/pagemap will return -EPERM for users
> without CAP_SYS_RAWIO. It can be disabled like this:
>
> mount -o remount,notparanoid /proc
>
> The option is applied to the pid namespace, so an app that wanted
> a separate policy from the rest of the system could get run in
> its own pid namespace.
>
> I'm not really that stuck on the name. I'm not opposed to making
> it apply only to pagemap or to giving it a pagemap-specific
> name.
>
> pagemap is also the kind of feature that could be used to escalate
> privileged from root in to the kernel. It probably needs to be
> protected in the same way that /dev/mem or module loading is in
> cases where the kernel needs to be protected from root, thus the
> choice to use CAP_SYS_RAWIO.
>
> Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Seems reasonable. I would note that even CAP_SYS_RAWIO isn't enough to
actually do anything with RAM in /dev/mem. That's entirely controlled
by CONFIG_STRICT_DEVMEM.
I think /proc/kpagecount and /proc/kpageflags should get filtered as
well, instead of them relying on the uid=0 check.
Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
-Kees
> ---
>
> b/fs/proc/root.c | 10 +++++++++-
> b/fs/proc/task_mmu.c | 11 +++++++++++
> b/include/linux/pid_namespace.h | 1 +
> 3 files changed, 21 insertions(+), 1 deletion(-)
>
> diff -puN fs/proc/root.c~privileged-pagemap fs/proc/root.c
> --- a/fs/proc/root.c~privileged-pagemap 2015-03-09 13:33:12.104796793 -0700
> +++ b/fs/proc/root.c 2015-03-09 13:33:12.111797109 -0700
> @@ -39,10 +39,12 @@ static int proc_set_super(struct super_b
> }
>
> enum {
> - Opt_gid, Opt_hidepid, Opt_err,
> + Opt_gid, Opt_hidepid, Opt_paranoid, Opt_notparanoid, Opt_err,
> };
>
> static const match_table_t tokens = {
> + {Opt_paranoid, "paranoid"},
> + {Opt_notparanoid, "notparanoid"},
> {Opt_hidepid, "hidepid=%u"},
> {Opt_gid, "gid=%u"},
> {Opt_err, NULL},
> @@ -70,6 +72,12 @@ static int proc_parse_options(char *opti
> return 0;
> pid->pid_gid = make_kgid(current_user_ns(), option);
> break;
> + case Opt_paranoid:
> + pid->paranoid = 1;
> + break;
> + case Opt_notparanoid:
> + pid->paranoid = 0;
> + break;
> case Opt_hidepid:
> if (match_int(&args[0], &option))
> return 0;
> diff -puN fs/proc/task_mmu.c~privileged-pagemap fs/proc/task_mmu.c
> --- a/fs/proc/task_mmu.c~privileged-pagemap 2015-03-09 13:33:12.106796883 -0700
> +++ b/fs/proc/task_mmu.c 2015-03-09 13:33:12.112797154 -0700
> @@ -1322,9 +1322,20 @@ out:
>
> static int pagemap_open(struct inode *inode, struct file *file)
> {
> + struct pid_namespace *ns = inode->i_sb->s_fs_info;
> +
> pr_warn_once("Bits 55-60 of /proc/PID/pagemap entries are about "
> "to stop being page-shift some time soon. See the "
> "linux/Documentation/vm/pagemap.txt for details.\n");
> +
> + /*
> + * Use the RAWIO capability bit. If you can not go open
> + * /dev/mem, then you also have no business knowing the
> + * physical addresses of things.
> + */
> + if (ns->paranoid && !capable(CAP_SYS_RAWIO))
> + return -EPERM;
> +
> return 0;
> }
>
> diff -puN include/linux/pid_namespace.h~privileged-pagemap include/linux/pid_namespace.h
> --- a/include/linux/pid_namespace.h~privileged-pagemap 2015-03-09 13:33:12.108796973 -0700
> +++ b/include/linux/pid_namespace.h 2015-03-09 13:33:12.112797154 -0700
> @@ -43,6 +43,7 @@ struct pid_namespace {
> struct work_struct proc_work;
> kgid_t pid_gid;
> int hide_pid;
> + int paranoid;
> int reboot; /* group exit code if this pidns was rebooted */
> struct ns_common ns;
> };
> _
--
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/