Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load

From: Filipe Brandenburger
Date: Fri Sep 19 2014 - 23:46:49 EST


On Fri, Sep 19, 2014 at 8:27 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> I thought of doing that from the prctl but AFAICT remap_pfn_range
>> requires that it's unmapped before the call (remap_pte_range has
>> BUG_ON(!pte_none(*pte));) and doing a zap_page_range followed by
>> remap_pfn_range might incur a race condition if another thread of the
>> same process is accessing the vvar page at that time... Am I wrong
>> about that race?
>
> No, you're right. Ugh.
>
> It might pay to add an explicit .fault callback to vm_special_mapping,
> but your approach will work, too. The main downside is more memory
> overhead per mm.

One way to limit that overhead was to keep only two static
vm_special_mapping structs (one for the real vvar page and one for the
zero page) and then change the vma->vm_private_data to point to one or
the other when the prctl is called. I just thought of that recently
and haven't tried it yet so not sure if it will cause other
problems...

>>> Changing out the text is a whole can of worms involving self-modifying
>>> code, although it may be completely safe if done through the page
>>> tables. But I don't think you can't use remap_pfn_range for that.
>>
>> No, I'm not planning to change the text, just replacing the vvar page
>> swapping the one where the vsyscall_gtod_data lives with a zero page
>> (and back depending on the parameter of the prctl.)
>
> Hmm. This will break the _COARSE timing functions as well as
> __vdso_time. That'll need fixing, either by swapping out the code
> (yuck!) or by adding a branch to all of those code paths.
>
> Maybe there's a non-branchy way, though. Let me think.

So this is what I was thinking of extending the vvar region to include
one extra page (right now it's two pages, the one with the
vsyscall_gtod_data and the one with the hrtimer I/O mapping, so this
would be a third page.) This third page would be initially mapped to
the zero page, so it wouldn't cost an extra page of memory per process
or even globally.

The prctl would swap that page with another page allocated as the
prctl is called. That one page that's allocated with the prctl would
be inherited over forks and execs, so if you have a bunch of processes
that need the same offset then they'd all share that one single page.

The vdso clock routines would get the base clock information from
vsyscall_gtod_data, hpet, tsc, etc. and would simply add the offsets
from the task_vvar page (not too branch-y, just a few extra adds.) In
the regular case, that would be a zero page so it would be simply
adding zero. If we need to apply offsets, then we replace that page
with one that has a struct with the right offsets for each clock type.

When running in kernel mode, the clock code (and timer code, etc.)
would find that page through the mm_struct and find the clock offsets
that way.

I was working on this approach before you suggested swapping the
actual vvar page (and not sure if you were actually looking at
hot-swapping the vdso itself.) I can go back to working on it and send
you a few commits since I still have some of that code around.

Cheers,
Filipe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/