Re: [RFC][PATCH 1/2] fs proc: make pagemap a privileged interface

From: Kees Cook
Date: Mon Mar 09 2015 - 19:40:39 EST


On Mon, Mar 9, 2015 at 4:08 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> Kees Cook <keescook@xxxxxxxxxxxx> writes:
>
>> On Mon, Mar 9, 2015 at 3:13 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>>> Dave Hansen <dave@xxxxxxxx> writes:
>>>
>>>> From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>>>>
>>>> Physical addresses are sensitive information. There are
>>>> existing, known exploits that are made easier if physical
>>>> information is available. Here is one example:
>>>>
>>>> http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf
>>>>
>>>> If you know the physical address of something you also know at
>>>> which kernel virtual address you can find something (modulo
>>>> highmem). It means that things that keep the kernel from
>>>> accessing user mappings (like SMAP/SMEP) can be worked around
>>>> because the _kernel_ mapping can get used instead.
>>>>
>>>> But, /proc/$pid/pagemap exposes the physical addresses of all
>>>> pages accessible to userspace. This works against all of the
>>>> efforts to keep kernel addresses out of places where unprivileged
>>>> apps can find them.
>>>>
>>>> This patch introduces a "paranoid" option for /proc. It can be
>>>> enabled like this:
>>>>
>>>> mount -o remount,paranoid /proc
>>>>
>>>> Or when /proc is mounted initially. When 'paranoid' mode is
>>>> active, opens to /proc/$pid/pagemap will return -EPERM for users
>>>> without CAP_SYS_RAWIO. It can be disabled like this:
>>>>
>>>> mount -o remount,notparanoid /proc
>>>>
>>>> The option is applied to the pid namespace, so an app that wanted
>>>> a separate policy from the rest of the system could get run in
>>>> its own pid namespace.
>>>>
>>>> I'm not really that stuck on the name. I'm not opposed to making
>>>> it apply only to pagemap or to giving it a pagemap-specific
>>>> name.
>>>>
>>>> pagemap is also the kind of feature that could be used to escalate
>>>> privileged from root in to the kernel. It probably needs to be
>>>> protected in the same way that /dev/mem or module loading is in
>>>> cases where the kernel needs to be protected from root, thus the
>>>> choice to use CAP_SYS_RAWIO.
>>>
>>>
>>> There is already a way to make pagemap go away. It is called
>>> CONFIG_PROC_PAGE_MONITOR.
>>>
>>> I suspect the right answer here is if you enable kernel address
>>> randomization you disable CONFIG_PROC_PAGE_MONTIOR. Aka you make the
>>> two options conflict with each other.
>>
>> It's not a good idea to make CONFIG options conflict with each other
>> like this as it puts distros is a tricky spot to decide which to use.
>> Allowing both and having a runtime flag of some kind tends to be the
>> better option (e.g. kASLR vs Hibernation).
>
> But there is a fundamental conflict. As such it might as well be
> expressed in Kconfig.

Hm? I was using kASLR vs Hibernation as an example of something that
while even at odds with each other currently is available as a runtime
selectable option (putting "kaslr" on the command line enables it and
disables hibernation, rather than forcing a CONFIG choice to pick one
or the other).

>
>>> That is a lot less code and a lot less to maintain.
>>>
>>> On the other hand if this is truly a valuable interface that you can't
>>> part with we need an alternative to pagemaps that does the same job
>>> with out the exploit potential. And I don't how to do that.
>>>
>>> Arguing in favor of just making the options conflict is the fact that
>>> kernel address randomization is pretty much snake oil. At least on
>>> x86_64 the address pool is so small it can be trivially brute forced. I
>>> think there are maybe 10 bits you can randomize within.
>>>
>>> As for a way to disable this I expect it would do better with something
>>> like a set once flag that prevents a process and all of it's children
>>> from accessing this file.
>>>
>>> *Blink* *Blink* Did you say you are worried about escalting privileges
>>> from root into the kernel space. That is non-sense. We give root the
>>> power to shot themselves in the foot and any proc option will be
>>> something that root will be able to get around.
>>>
>>> The pieces of the patch description don't add up.
>>
>> No, that's an entirely valid use-case. You can trust the kernel but
>> not root. This is the point of the "trusted_kernel" patch series that
>> disables all sorts of dangerous interfaces that allow root to get at
>> physical memory.
>>
>> This situation is more a memory leak than a direct compromise, so it
>> seems like providing at least some runtime control of it (separate
>> from potential future "trusted_kernel" stuff) makes sense.
>
> I am too tired to argue about the kASLR snake-oil.

No problem. :)

>
> I do not think a proc mount option is at all apropriate for controlling
> the behavior of the pagemap file. And "paranoid" is entirely too
> generic of a string to have any meaning.
>
> Either just tighten the permissions when kASLR is enabled, or have the
> file go away entirely.
>
> If you want run-time knobs there are all kinds of run-time knobs you can
> use.
>
> If the concern is to protect against root getting into the kernel the
> "trusted_kernel" snake-oil just compile out the pagemap file. Nothing
> else is remotely interesting from a mainenance point of view.

Distros cannot opt to compile out the pagemap file. They want to
provide end users with one kernel that can do both, selectable at
runtime. If I want to make it harder for things that need physical
page maps to attack my system, I'd like to be able to turn it on in my
distro. And since I can remove CAP_SYS_RAWIO from init during my
initramfs, I would love to have this flag.

-Kees

>
> As I said.
> Nacked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>
> Eric



--
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/