Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load

From: Richard Larocque
Date: Fri Sep 19 2014 - 17:26:13 EST


On Fri, Sep 19, 2014 at 12:27 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Wed, Sep 17, 2014 at 7:28 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> On Sep 17, 2014 1:46 AM, "H. Peter Anvin" <hpa@xxxxxxxxx> wrote:
>>>
>>> On 09/16/2014 11:21 PM, Filipe Brandenburger wrote:
>>> > Hi Andy,
>>> >
>>> > On Tue, Sep 16, 2014 at 10:00 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>> >> I think that the patch should instead tweak the vvar mapping to tell
>>> >> the vdso not to use rdtsc. It should be based on this:
>>> >
>>> > I've been working on this approach which extends the vvar from 2 to 3
>>> > pages. The third page would initially be mapped to a zero page but
>>> > then through a prctl a task could replace it with a real page that
>>> > could then be inherited through fork and exec.
>>> >
>>> > That would make it possible to have per-task vvar contents.
>>> >
>>> > We could use some of those values as flags to indicate whether vdso
>>> > routines may use RDTSC or not.
>>> >
>>> > In the future, we're planning to also use that to store clock offsets
>>> > so that we can ensure CLOCK_MONOTONIC works after CRIU migration
>>> > without having to turn off the VDSO or have to always fallback to full
>>> > syscalls on every case.
>>> >
>>> > Do you think that would be a reasonable way to accomplish that?
>>> >
>>>
>>> Why would we need/want per process vvar contents? It seems better to
>>> have the code swapped out.
>>
>> That seems messier from a build perspective. Also, if we ever want to
>> switch this dynamically, swapping out data is much easier than
>> swapping out code. I think we should be able to replace the vvar page
>> with the zero page, though.
>>
>> One tricky bit: currently we can only easily do this on exec, but we
>> should be able to do it immediately if we start tracking mremap of the
>> vdso. Should we make that a prerequisite? I don't really want this
>> to end up being permanently weird.
>
> I have this (special mapping tracking) 3/4 implemented. I'm planning
> on making it fully functional for 64-bit programs and almost correct
> for 32-bit. (You'll still crash if you have multiple threads, you use
> sysenter, and you remap the vdso, but I think that this is essentially
> unavoidable until someone lets mremap work on multiple vmas at once.)
>

Thanks! I look forward to seeing the result.

I have some per-process clock offset patches that currently work only
when the vDSO is disabled. It sounds like your patches will provide a
clean solution to deal with that issue. I'll try to rebase my work on
top of your changes when they're ready.

We've also got some patches to apply an offset to the TSC that could
benefit from your changes, but I guess there's not much appetite for
merging them. That's fine with me. I don't see any need for that
feature until we have a few examples of applications that could be
broken by TSC changes during migration.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/