Re: [PATCH v4 2/3] x86/xen/time: setup vcpu 0 time info page

From: Joao Martins
Date: Wed Sep 27 2017 - 16:58:28 EST


On 09/27/2017 09:22 PM, Boris Ostrovsky wrote:
> On 09/27/2017 11:26 AM, Joao Martins wrote:
>> On 09/27/2017 03:40 PM, Boris Ostrovsky wrote:
>>>> +static void xen_setup_vsyscall_time_info(void)
>>>> +{
>>>> + struct vcpu_register_time_memory_area t;
>>>> + struct pvclock_vsyscall_time_info *ti;
>>>> + struct pvclock_vcpu_time_info *pvti;
>>>> + int ret;
>>>> +
>>>> + pvti = &__this_cpu_read(xen_vcpu)->time;
>>>> +
>>>> + /*
>>>> + * We check ahead on the primary time info if this
>>>> + * bit is supported hence speeding up Xen clocksource.
>>>> + */
>>>> + if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))
>>>> + return;
>>>> +
>>>> + pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
>>> Is it OK to have this flag set if anything below fails?
>>>
>> Yes - if anything below fails it will only affect userspace mapped page.
>
> Then should it be set somewhere else, like in xen_time_init()?
>
Hm, I could move it if you think it's better - but given the importance of the
bit we are checking and its direct correlation to whether or not we can setup
VCLOCK_PVCLOCK then I find it cleaner to have it here in the same routine. One
thing I failed to mention before is that checking ahead like above, let us also
avoid allocating a page plus an hypercall to register the pvti just to check the
one bit of info we need for using VCLOCK_PVCLOCK.

It is very unlikely with current Xen code that 1) the secondary copy register
below fails, or 2) master and secondary don't have the same bits set. So in case
you're reconsidering the "shortcut" check above I can move it like we had in v1
and have pvclock_set_flags right before pvclock_set_pvti_cpu0_va().

>> What I
>> do above is just allowing xen clocksource to use/check that bit (consequently
>> speeding up sched_clock) given the necessary support is there in the master
>> copy. The secondary copy (i.e. what's being set up below, mapped/used in vdso)
>> has the same data from the master copy, just separate memory regions. The checks
>> below are just for the unlikely cases of failing to register the secondary copy
>> or if its content were to differ from master copy in future releases - and
>> therefore we handle those more gracefully.
>>
>>> (I can see in the changelog that apparently at some point I've asked
>>> about this at v1 but I can't remember/find what exactly it was)
>>>
>>>> +
>>>> + ti = (struct pvclock_vsyscall_time_info *)get_zeroed_page(GFP_KERNEL);
>>>> + if (!ti)
>>>> + return;
>>>> +
>>>> + t.addr.v = &ti->pvti;
>>>> +
>>>> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
>>>> + if (ret) {
>>>> + pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
>>>> + free_page((unsigned long)ti);
>>>> + return;
>>>> + }
>>>> +
>>>> + /*
>>>> + * If the check above succedded this one should too since it's the
>>>> + * same data on both primary and secondary time infos just different
>>>> + * memory regions. But we still check it in case hypervisor is buggy.
>>>> + */
>>>> + pvti = &ti->pvti;
>>>> + if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) {
>>>> + t.addr.v = NULL;
>>>> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
>>>> + 0, &t);
>>>> + if (!ret)
>>>> + free_page((unsigned long)ti);
>>>> +
>>>> + pr_notice("xen: VCLOCK_PVCLOCK not supported (tsc unstable)\n");
>>>> + return;
>>>> + }
>>>> +
>>>> + xen_clock = ti;
>>>> + pvclock_set_pvti_cpu0_va(xen_clock);
>>>> +
>>>> + xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
>>>> +}
>>>> +
>