Re: [PATCH v5] procfs: Always expose /proc/<pid>/map_files/ and make it readable

From: Calvin Owens
Date: Wed May 20 2015 - 21:52:37 EST


On Tuesday 05/19 at 11:04 -0700, Andy Lutomirski wrote:
> On Mon, May 18, 2015 at 8:10 PM, Calvin Owens <calvinowens@xxxxxx> wrote:
> > Currently, /proc/<pid>/map_files/ is restricted to CAP_SYS_ADMIN, and
> > is only exposed if CONFIG_CHECKPOINT_RESTORE is set. This interface is
> > very useful for enumerating the files mapped into a process when the
> > more verbose information in /proc/<pid>/maps is not needed. It also
> > allows access to file descriptors for files that have been deleted and
> > closed but are still mmapped into a process, which can be very useful
> > for introspection and debugging.
> >
> > This patch moves the folder out from behind CHECKPOINT_RESTORE, and
>
> I'm fine with this.
>
> > removes the CAP_SYS_ADMIN restrictions. With that change alone,
> > following the links would have required PTRACE_MODE_READ like the
> > links in /proc/<pid>/fd/*.
>
> I'm still not at all convinced that this is safe. Here are a few ways
> that it could have unintended consequences:
>
> 1. Mmap a dma-buf and then open /proc/self/map_files/addr. You get an
> fd pointing at a different inode than you mapped. (kdbus would have
> the same problem if it were merged.)
>
> 2. Open a file with O_RDONLY, mmap it with PROT_READ, close the file,
> then open /proc/self/map_files/addr with O_RDWR. I don't see anything
> preventing that from succeeding.

Hmm, that's a good point: it lets you bypass the permission checks on
all the path components you would normally walk through to get to the
file. But it still only works if you actually have permission to open
the file in question for writing.

Also, this is already how the /proc/N/fd/* symlinks work, isn't it?

> 3. Open a file, mmap it, close the fd, chroot, drop privileges, open
> /proc/self/map_files/addr, then call ftruncate.

This doesn't work unless the privileges you dropped to actually allow
you to open the mmapped file for writing. It's really the same
fundamental problem as (2), where you're allowing direct access to a
file without trying to walk the path down to it, right?

> So NAK as-is, I think.

Limiting ->follow_link() to CAP_SYS_ADMIN wouldn't affect anything I
imagine using this interface for (see below), so I have no problem with
putting that back in. I think that would alleviate all your concerns
above, right?

(That said, I don't think it makes sense to limit readdir() or
readlink() on map_files/* to CAP_SYS_ADMIN, since that alone is a subset
of what you can get from /proc/N/maps.)

> Fixing #1 would involve changing the way mmap works, I think. Fixing
> #2 would require similar infrastructure to what we'd need to fix the
> existing /proc/pid/fd mode holes. I have no clue how to even approach
> fixing #3.
>
> What's the use case of this patch?

The biggest use case: it enables you to stat() files that have been
deleted but are still mapped by some process.

This enables a much quicker and more accurate answer to the question
"How much disk space is being consumed by files that are deleted but
still mapped?" than is currently possible.

It also allows you to know how much space a specific mapped-but-deleted
file is using on a specific filesystem, which is currently impossible
from userspace AFAIK.

Thanks,
Calvin

> --Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/