Re: [PATCH v3 00/18] vDSO: Introduce generic data storage
From: Thomas Weißschuh
Date: Thu Feb 06 2025 - 06:03:08 EST
On Thu, Feb 06, 2025 at 09:31:42AM +0000, David Woodhouse wrote:
> On Tue, 2025-02-04 at 13:05 +0100, Thomas Weißschuh wrote:
> > Currently each architecture defines the setup of the vDSO data page on
> > its own, mostly through copy-and-paste from some other architecture.
> > Extend the existing generic vDSO implementation to also provide generic
> > data storage.
> > This removes duplicated code and paves the way for further changes to
> > the generic vDSO implementation without having to go through a lot of
> > per-architecture changes.
> >
> > Based on v6.14-rc1 and intended to be merged through the tip tree.
Note: The real answer will need to come from the timekeeping
maintainers, my personal two cents below.
> Thanks for working on this. Is there a plan to expose the time data
> directly to userspace in a form which is usable *other* than by
> function calls which get the value of the clock at a given moment?
There are no current plans that I am aware of.
> For populating the vmclock device¹ we need to know the actual
> relationship between the hardware counter (TSC, arch timer, etc.) and
> real time in order to propagate that to the guest.
>
> I see two options for doing this:
>
> 1. Via userspace, exposing the vdso time data (and a notification when
> it changes?) and letting the userspace VMM populate the vmclock.
> This is complex for x86 because of TSC scaling; in fact userspace
> doesn't currently know the precise scaling from host to guest TSC
> so we'd have to be able to extract that from KVM.
Exposing the raw vdso time data is problematic as it precludes any
evolution to its datastructures, like the one we are currently doing.
An additional, trimmed down and stable data structure could be used.
But I don't think it makes sense. The vDSO is all about a stable
highlevel function interface on top of an unstable data interface.
However the vmclock needs the lowlevel data to populate its own
datastructure, wrapping raw data access in function calls is unnecessary.
If no functions are involved then the vDSO is not needed. The data can
be maintained separately in any other place in the kernel and accessed
or mapped by userspace from there.
Also the vDSO does not have an active notification mechanism, this would
probably be implemented through a filedescriptor, but then the data
can also be mapped through exactly that fd.
> 2. In kernel, asking KVM to populate the vmclock structure much like
> it does other pvclocks shared with the guest. KVM/x86 already uses
> pvclock_gtod_register_notifier() to hook changes; should we expand
> on that? The problem with that notifier is that it seems to be
> called far more frequently than I'd expect.
This sounds better, especially as any custom ABI from the host kernel to
the VMM would look a lot like the vmclock structure anyways.
Timekeeper updates are indeed very frequent, but what are the concrete
issues? That frequency is fine for regular vDSO data page updates,
updating the vmclock data page should be very similar.
The timekeeper core can pass context to the notifier callbacks, maybe
this can be used to skip some expensive steps where possible.
> ¹ https://gitlab.com/qemu-project/qemu/-/commit/3634039b93cc5