Re: [PATCH] x86/vdso: Add prctl to set per-process VDSO load

From: Andy Lutomirski
Date: Wed Sep 17 2014 - 01:01:07 EST


On Tue, Sep 16, 2014 at 6:18 PM, Richard Larocque <rlarocque@xxxxxxxxxx> wrote:
> On Tue, Sep 16, 2014 at 5:27 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> On Tue, Sep 16, 2014 at 5:05 PM, Richard Larocque <rlarocque@xxxxxxxxxx> wrote:
>>> Adds new prctl calls to enable or disable VDSO loading for a process
>>> and its children.
>>>
>>> The PR_SET_DISABLE_VDSO call takes one argument, which is interpreted as
>>> a boolean value. If true, it disables the loading of the VDSO on exec()
>>> for this process and any children created after this call. A false
>>> value unsets the flag.
>>>
>>> The PR_GET_DISABLE_VDSO option returns a non-negative true value if VDSO
>>> loading has been disabled for this process, zero if it has not been
>>> disabled, and a negative value in case of error.
>>>
>>> These prctl calls are hidden behind a new Kconfig,
>>> CONFIG_VDSO_DISABLE_PRCTL. This feature is available only on x86.
>>>
>>> The command line option vdso=0 overrides the behavior of
>>> PR_SET_DISABLE_VDSO, however, PR_GET_DISABLE_VDSO will coninue to return
>>> whetever setting was last set with PR_SET_DISABLE_VDSO.
>>>
>>> Signed-off-by: Richard Larocque <rlarocque@xxxxxxxxxx>
>>> ---
>>> This patch is part of some work to better handle times and CRIU migration.
>>> I suspect that there are other use cases out there, so I'm offering this
>>> patch separately.
>>>
>>> When considering CRIU migration and times, we put some thought into how
>>> to handle the rdtsc instruction. If we migrate between machines or across
>>> reboots, the migrated process will see values that could break its assumptions
>>> about how rdtsc is supposed to work.
>>
>> I don't get it.
>>
>> If __vdso_clock_gettime returns the wrong value in any scenario, we
>> should fix that. Simiarly, CRIU *already works*, unless there's
>> something I don't know of.
>
> Right. As far as I know, there's nothing wrong with the use of RDTSC
> in the vDSO following a migration. The problem is that some
> applications might use RDTSC outside of the vDSO. If they save the
> returned values, then compare pre- and post- migration values, bad
> things could happen (in theory).

These applications are broken, full stop. They will misbehave on VMs,
or older machines, and even on the rather new piece of sh*t MSI
motherboard under my desk. I think that CRIU is just icing on the
cake. Also, they'll probably just crash if you turn off RDTSC.

>
> Anything we do to try to trap and handle the use of RDTSC in wider
> userspace will affect its use in the vDSO, too. In some situations,
> it might be nice to run applications with no vDSO and PR_TSC_SIGSEGV,
> just to make sure they don't have any heavy reliance on the TSC. It
> would be nice if those applications didn't crash when they called
> clock_gettime().

Agreed. But let's do it without turning off the vdso. Also, turning
off the 32-bit vdso could break a lot of things.

>
> Another alternative is to trap and adjust the RDTSC. That might be a
> viable option for applications that care about reliable RDTSC behavior
> and migration, but don't care about performance. I think it makes
> sense to disable the vDSO in that case, rather than trap on every call
> that it makes.

Here I disagree. Let's just tweak the vdso not to use rdtsc in this case.

>
>> That being said, I would like an option to gate off RDTSC for a
>> process and its children in order to make PR_TSC_SIGSEGV more useful.
>> All the prerequisites are there now.
>
> Agreed. That's what this patch is attempting to do, and that's the
> main reason why I figured it was worth submitting independent of any
> other time-related work.
>
>> What problem are you trying to solve exactly?
>
> Eventually, we'd like to make it so that neither RDTSC nor
> CLOCK_MONOTONIC can go backwards following a migration.
>
> The fix for RDTSC starts here. Building on this patch as a base, we
> can either ban it from being used entirely, or write some code to
> adjust its value as necessary.
>
> The CLOCK_MONOTONIC fix will be a different patch stack. We're
> currently hoping to do that without disable the vDSO, but that's
> another discussion.

I think that the patch should instead tweak the vvar mapping to tell
the vdso not to use rdtsc. It should be based on this:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/vsyscall

and I'll talk to hpa tomorrow about about getting that, or something
like it, into the tip tree. In particular, you'll need this:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vsyscall&id=0cc410a05cb95e073ebfe099c9e03cef48d2be0f

Also, this kind of inheritable restriction may end up requiring
no_new_privs or CAP_SYS_ADMIN to be secure.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/