Re: [PATCH] ptp: Add vDSO-style vmclock support
From: David Woodhouse
Date: Thu Jul 25 2024 - 11:19:05 EST
On Thu, 2024-07-25 at 10:11 -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 25, 2024 at 02:50:50PM +0100, David Woodhouse wrote:
> > Even if the virtio-rtc specification were official today, and I was
> > able to expose it via PCI, I probably wouldn't do it that way. There's
> > just far more in virtio-rtc than we need; the simple shared memory
> > region is perfectly sufficient for most needs, and especially ours.
>
> I can't stop amazon from shipping whatever in its hypervisor,
> I'd just like to understand this better, if there is a use-case
> not addressed here then we can change virtio to address it.
>
> The rtc driver patch posted is 900 lines, yours is 700 lines, does not
> look like a big difference. As for using a memory region, this is
> valid, but maybe rtc should be changed to do exactly that?
I'm certainly aiming for virtio-rtc to include that as an *option*,
because I think I don't think it makes sense for an RTC specification
aimed at virtual machines *not* to deal with the live migration
problem.
AFAICT the only ways to deal with the LM problem are either to make a
hypercall/virtio transaction for *every* clock read which needs to be
accurate, or expose a memory region for the guest to do it "vDSO-
style".
And similarly, unless we want guest userspace to have to make a
*system* call every time, that memory region needs to be mappable all
the way to userspace.
The use case isn't necessarily for all users of gettimeofday(), of
course; this is for those applications which *need* precision time.
Like distributed databases which rely on timestamps for coherency, and
users who get fined millions of dollars when LM messes up their clocks
and they put wrong timestamps on financial transactions.
> E.g. we can easily add a capability describing such a region.
> or put it in device config space.
I think it has to be memory, not config space. But yes.
The intent is that my driver would be usable with the shared memory
region from a virtio-rtc device too. It'd need a tiny amount of
refactoring of the discovery code in vmclock_probe(), which I haven't
done yet as it would be premature optimisation.
> I mean yes, we can build a new transport for each specific need but in
> the end we'll get a ton of interfaces with unclear compatibility
> requirements. If effort is instead spent improving common interfaces,
> we get consistency and everyone benefits. That's why I'm trying to
> understand the need here.
It's simplicity. Because this isn't even a "transport". It's just a
simple breadcrumb given to the guest to tell it where the information
is.
In the fullness of time assuming this becomes part of virtio-rtc too,
the fact that it can *also* be discovered by ACPI is just a tiny
detail. And it allows hypervisors to implement it a *whole* lot more
simply.
The addition of an ACPI method to enable the timekeeping does make it a
tiny bit more than a 'breadcrump', I concede — but that's still
basically trivial to implement. A whole lot simpler than a full virtio
device.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature