Re: [PATCH v4 2/3] x86/xen/time: setup vcpu 0 time info page

From: Boris Ostrovsky
Date: Wed Sep 27 2017 - 18:44:59 EST


On 09/27/2017 04:57 PM, Joao Martins wrote:
> On 09/27/2017 09:22 PM, Boris Ostrovsky wrote:
>> On 09/27/2017 11:26 AM, Joao Martins wrote:
>>> On 09/27/2017 03:40 PM, Boris Ostrovsky wrote:
>>>>> +static void xen_setup_vsyscall_time_info(void)
>>>>> +{
>>>>> + struct vcpu_register_time_memory_area t;
>>>>> + struct pvclock_vsyscall_time_info *ti;
>>>>> + struct pvclock_vcpu_time_info *pvti;
>>>>> + int ret;
>>>>> +
>>>>> + pvti = &__this_cpu_read(xen_vcpu)->time;
>>>>> +
>>>>> + /*
>>>>> + * We check ahead on the primary time info if this
>>>>> + * bit is supported hence speeding up Xen clocksource.
>>>>> + */
>>>>> + if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))
>>>>> + return;
>>>>> +
>>>>> + pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
>>>> Is it OK to have this flag set if anything below fails?
>>>>
>>> Yes - if anything below fails it will only affect userspace mapped page.
>> Then should it be set somewhere else, like in xen_time_init()?
>>
> Hm, I could move it if you think it's better - but given the importance of the
> bit we are checking and its direct correlation to whether or not we can setup
> VCLOCK_PVCLOCK then I find it cleaner to have it here in the same routine. One
> thing I failed to mention before is that checking ahead like above, let us also
> avoid allocating a page plus an hypercall to register the pvti just to check the
> one bit of info we need for using VCLOCK_PVCLOCK.
>
> It is very unlikely with current Xen code that 1) the secondary copy register
> below fails, or 2) master and secondary don't have the same bits set. So in case
> you're reconsidering the "shortcut" check above I can move it like we had in v1
> and have pvclock_set_flags right before pvclock_set_pvti_cpu0_va().

I think it would be more logical to move it to the end like in v1.

But can you explain again why this flag should not be set in
xen_time_init()? It seems to me that it would be useful not just for
vDSO but for xen_clocksource_read()->pvclock_clocksource_read() as well.

-boris


>
>>> What I
>>> do above is just allowing xen clocksource to use/check that bit (consequently
>>> speeding up sched_clock) given the necessary support is there in the master
>>> copy. The secondary copy (i.e. what's being set up below, mapped/used in vdso)
>>> has the same data from the master copy, just separate memory regions. The checks
>>> below are just for the unlikely cases of failing to register the secondary copy
>>> or if its content were to differ from master copy in future releases - and
>>> therefore we handle those more gracefully.
>>>
>>>> (I can see in the changelog that apparently at some point I've asked
>>>> about this at v1 but I can't remember/find what exactly it was)
>>>>
>>>>> +
>>>>> + ti = (struct pvclock_vsyscall_time_info *)get_zeroed_page(GFP_KERNEL);
>>>>> + if (!ti)
>>>>> + return;
>>>>> +
>>>>> + t.addr.v = &ti->pvti;
>>>>> +
>>>>> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
>>>>> + if (ret) {
>>>>> + pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
>>>>> + free_page((unsigned long)ti);
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * If the check above succedded this one should too since it's the
>>>>> + * same data on both primary and secondary time infos just different
>>>>> + * memory regions. But we still check it in case hypervisor is buggy.
>>>>> + */
>>>>> + pvti = &ti->pvti;
>>>>> + if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) {
>>>>> + t.addr.v = NULL;
>>>>> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
>>>>> + 0, &t);
>>>>> + if (!ret)
>>>>> + free_page((unsigned long)ti);
>>>>> +
>>>>> + pr_notice("xen: VCLOCK_PVCLOCK not supported (tsc unstable)\n");
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + xen_clock = ti;
>>>>> + pvclock_set_pvti_cpu0_va(xen_clock);
>>>>> +
>>>>> + xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
>>>>> +}
>>>>> +