Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64

From: Florian Weimer
Date: Tue Sep 29 2020 - 04:13:56 EST


* Mathieu Desnoyers:

>> So we have a bootstrap issue here that needs to be solved, I think.
>
> The one thing I'm not sure about is whether the vDSO interface is indeed
> superior to KTLS, or if it is just the model we are used to.
>
> AFAIU, the current use-cases for vDSO is that an application calls into
> glibc, which then calls the vDSO function exposed by the kernel. I wonder
> whether the vDSO indirection is really needed if we typically have a glibc
> function used as indirection ? For an end user, what is the benefit of vDSO
> over accessing KTLS data directly from glibc ?

I think the kernel can only reasonably maintain a single userspace data
structure. It's not reasonable to update several versions of the data
structure in parallel.

This means that glibc would have to support multiple kernel data
structures, and users might lose userspace acceleration after a kernel
update, until they update glibc as well. The glibc update should be
ABI-compatible, but someone would still have to backport it, apply it to
container images, etc.

What's worse, the glibc code would be quite hard to test because we
would have to keep around multiple kernel versions to exercise all the
different data structure variants.

In contrast, the vDSO code always matches the userspace data structures,
is always updated at the same time, and tested together. That looks
like a clear win to me.

> If we decide that using KTLS from a vDSO function is indeed a requirement,
> then, as you point out, the thread_pointer is available as ABI, but we miss
> the KTLS offset.
>
> Some ideas on how we could solve this: we could either make the KTLS
> offset part of the ABI (fixed offset), or save the offset near the
> thread pointer at a location that would become ABI. It would have to
> be already populated with something which can help detect the case
> where a vDSO is called from a thread which does not populate KTLS
> though. Is that even remotely doable ?

I don't know.

We could decide that these accelerated system calls must only be called
with a valid TCB. That's unavoidable if the vDSO sets errno directly,
so it's perhaps not a big loss. It's also backwards-compatible because
existing TCB-less code won't know about those new vDSO entrypoints.
Calling into glibc from a TCB-less thread has always been undefined.
TCB-less code would have to make direct, non-vDSO system calls, as today.

For discovering the KTLS offset, a per-process page at a fixed offset
from the vDSO code (i.e., what real shared objects already do for global
data) could store this offset. This way, we could entirely avoid an ABI
dependency.

We'll see what will break once we have the correct TID after vfork. 8->
glibc currently supports malloc-after-vfork as an extension, and
a lot of software depends on it (OpenJDK, for example).

>> With the latter, we could
>> directly expose the vDSO implementation to applications, assuming that
>> we agree that the vDSO will not fail with ENOSYS to request fallback to
>> the system call, but will itself perform the system call.
>
> We should not forget the fields needed by rseq as well: the rseq_cs
> pointer and the cpu_id fields need to be accessed directly from the
> rseq critical section, without function call. Those use-cases require
> that applications and library can know the KTLS offset and size and
> use those fields directly.

Yes, but those offsets could be queried using a function from the vDSO
(or using a glibc interface, to simplify linking).

Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill