perfmon trouble

From: Al Viro
Date: Sun Jun 10 2018 - 22:10:55 EST

On Sat, Jun 09, 2018 at 04:51:08PM +0100, Al Viro wrote:

> Stephane, could you comment on the situation in there? I realize that you
> hadn't touched that thing in more than a decade, but I've no idea who else
> might be familiar with that thing and it's very inconveniently special...

Having looked through that code... ouch. It tries to have munmap-on-close,
of all things. Which has interesting consequences; consider, for example,
fd = perfctl(-1, PFM_CREATE_CONTEXT, &blah, 1); // create a context
pid = fork();
if (!pid) {
execve("/usr/bin/something_suid", ...);

with something_suid(8) doing an explicit "close each descriptor past stdout"

PFM_CREATE_CONTEXT has created a context, mmapped its buffer (and stored
the address of that mapping in ctx->ctx_smpl_vaddr) and, having opened
an associated file, sticks it into descriptor table and returns the descriptor.

On fork/exec we have
* descriptor table copied to child
* all mappings copied to child and then destroyed by execve
* execve ends up with the new binary (and libraries, etc.) mmapped
(in child)

Now, our careful suid-root binary does close(2) on its copy of descriptor.
pfm_flush() is called. ctx->task != current, so we proceed to
* remove virtual mapping, if any, for the calling task.
* cannot reset ctx field until last user is calling close().
* ctx_smpl_vaddr must never be cleared because it is needed
* by every task with access to the context
* When called from do_exit(), the mm context is gone already, therefore
* mm is NULL, i.e., the VMA is already gone and we do not have to
* do anything here
if (ctx->ctx_smpl_vaddr && current->mm) {
smpl_buf_vaddr = ctx->ctx_smpl_vaddr;
smpl_buf_size = ctx->ctx_smpl_size;

UNPROTECT_CTX(ctx, flags);

* if there was a mapping, then we systematically remove it
* at this point. Cannot be done inside critical section
* because some VM function reenables interrupts.
if (smpl_buf_vaddr) pfm_remove_smpl_mapping(smpl_buf_vaddr, smpl_buf_size);

... with the last call doing vm_munmap() on the area in question. In the
address space of that suid-root binary, taking out whatever *it* had mapped
at that address range...

I wouldn't be surprised if that turned out to be realistically exploitable ;-/
Is there any documentation of that thing's semantics? perfmonctl(2) doesn't
mention the mapping at all and link to HP site in the arch/ia64/kernel/perfmon.c
is 404-compliant. Playing with brings a sourceforget reference,
but I hadn't been able to find anything ia64-related docs in there...