Re: [RFC PATCH v4] ptp: Add vDSO-style vmclock support

From: Peter Hilber
Date: Thu Jul 11 2024 - 03:26:12 EST


On 10.07.24 18:01, David Woodhouse wrote:
> On Wed, 2024-07-10 at 15:07 +0200, Peter Hilber wrote:
>> On 08.07.24 11:27, David Woodhouse wrote:
>>> From: David Woodhouse <dwmw@xxxxxxxxxxxx>
>>>
>>> The vmclock "device" provides a shared memory region with precision clock
>>> information. By using shared memory, it is safe across Live Migration.
>>>
>>> Like the KVM PTP clock, this can convert TSC-based cross timestamps into
>>> KVM clock values. Unlike the KVM PTP clock, it does so only when such is
>>> actually helpful.
>>>
>>> The memory region of the device is also exposed to userspace so it can be
>>> read or memory mapped by application which need reliable notification of
>>> clock disruptions.
>>>
>>> Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>
>>
>> [...]
>>
>>> +
>>> +struct vmclock_abi {
>>> +       /* CONSTANT FIELDS */
>>> +       uint32_t magic;
>>> +#define VMCLOCK_MAGIC  0x4b4c4356 /* "VCLK" */
>>> +       uint32_t size;          /* Size of region containing this structure */
>>> +       uint16_t version;       /* 1 */
>>> +       uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx except INVALID */
>>> +#define VMCLOCK_COUNTER_ARM_VCNT       0
>>> +#define VMCLOCK_COUNTER_X86_TSC                1
>>> +#define VMCLOCK_COUNTER_INVALID                0xff
>>> +       uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */
>>> +#define VMCLOCK_TIME_UTC                       0       /* Since 1970-01-01 00:00:00z */
>>> +#define VMCLOCK_TIME_TAI                       1       /* Since 1970-01-01 00:00:00z */
>>> +#define VMCLOCK_TIME_MONOTONIC                 2       /* Since undefined epoch */
>>> +#define VMCLOCK_TIME_INVALID_SMEARED           3       /* Not supported */
>>> +#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED     4       /* Not supported */
>>> +
>>> +       /* NON-CONSTANT FIELDS PROTECTED BY SEQCOUNT LOCK */
>>> +       uint32_t seq_count;     /* Low bit means an update is in progress */
>>> +       /*
>>> +        * This field changes to another non-repeating value when the CPU
>>> +        * counter is disrupted, for example on live migration. This lets
>>> +        * the guest know that it should discard any calibration it has
>>> +        * performed of the counter against external sources (NTP/PTP/etc.).
>>> +        */
>>> +       uint64_t disruption_marker;
>>> +       uint64_t flags;
>>> +       /* Indicates that the tai_offset_sec field is valid */
>>> +#define VMCLOCK_FLAG_TAI_OFFSET_VALID          (1 << 0)
>>> +       /*
>>> +        * Optionally used to notify guests of pending maintenance events.
>>> +        * A guest which provides latency-sensitive services may wish to
>>> +        * remove itself from service if an event is coming up. Two flags
>>> +        * indicate the approximate imminence of the event.
>>> +        */
>>> +#define VMCLOCK_FLAG_DISRUPTION_SOON           (1 << 1) /* About a day */
>>> +#define VMCLOCK_FLAG_DISRUPTION_IMMINENT       (1 << 2) /* About an hour */
>>> +#define VMCLOCK_FLAG_PERIOD_ESTERROR_VALID     (1 << 3)
>>> +#define VMCLOCK_FLAG_PERIOD_MAXERROR_VALID     (1 << 4)
>>> +#define VMCLOCK_FLAG_TIME_ESTERROR_VALID       (1 << 5)
>>> +#define VMCLOCK_FLAG_TIME_MAXERROR_VALID       (1 << 6)
>>> +       /*
>>> +        * Even regardless of leap seconds, the time presented through this
>>> +        * mechanism may not be strictly monotonic. If the counter slows down
>>> +        * and the host adapts to this discovery, the time calculated from
>>> +        * the value of the counter immediately after an update to this
>>> +        * structure, may appear to be *earlier* than a calculation just
>>> +        * before the update (while the counter was believed to be running
>>> +        * faster than it now is). A guest operating system will typically
>>> +        * *skew* its own system clock back towards the reference clock
>>> +        * exposed here, rather than following this clock directly. If,
>>> +        * however, this structure is being populated from such a system
>>> +        * clock which is already handled in such a fashion and the results
>>> +        * *are* guaranteed to be monotonic, such monotonicity can be
>>> +        * advertised by setting this bit.
>>> +        */
>>
>> I wonder if this might be difficult to define in a standard.
>
> I'm sure we could do better than my attempt above, but surely it isn't
> *so* hard to define monotonicity?
>
>> Is there a need to define device and driver behavior in more detail? What
>> would happen if e.g. the device first decides how to update the clock, but
>> is then slow to update the SHM?
>
> That's OK, isn't it?
>
> The key in the VMCLOCK_FLAG_TIME_MONOTONIC flag is that by setting it,
> the host guarantees that the time calculated according to this
> structure at any given moment, shall not appear to be later than the
> time calculated via the structure at any *later* moment.

IMHO this phrasing is better, since it directly refers to the state of the
structure.

AFAIU if there would be abnormal delays in store buffers, causing some
driver to still see the old clock for some time, the monotonicity could be
violated:

1. device writes new, much slower clock to store buffer
2. some time passes
3. driver reads old, much faster clock
4. device writes store buffer to cache
5. driver reads new, much slower clock

But I hope such delays do not occur.