Re: [PATCH v4 2/3] x86/xen/time: setup vcpu 0 time info page

From: Joao Martins
Date: Thu Sep 28 2017 - 06:17:27 EST


On 09/28/2017 12:46 AM, Joao Martins wrote:
> On 09/27/2017 11:44 PM, Boris Ostrovsky wrote:
>> On 09/27/2017 04:57 PM, Joao Martins wrote:
>>> On 09/27/2017 09:22 PM, Boris Ostrovsky wrote:
>>>> On 09/27/2017 11:26 AM, Joao Martins wrote:
>>>>> On 09/27/2017 03:40 PM, Boris Ostrovsky wrote:
>>>>>>> +static void xen_setup_vsyscall_time_info(void)
>>>>>>> +{
>>>>>>> + struct vcpu_register_time_memory_area t;
>>>>>>> + struct pvclock_vsyscall_time_info *ti;
>>>>>>> + struct pvclock_vcpu_time_info *pvti;
>>>>>>> + int ret;
>>>>>>> +
>>>>>>> + pvti = &__this_cpu_read(xen_vcpu)->time;
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * We check ahead on the primary time info if this
>>>>>>> + * bit is supported hence speeding up Xen clocksource.
>>>>>>> + */
>>>>>>> + if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))
>>>>>>> + return;
>>>>>>> +
>>>>>>> + pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
>>>>>> Is it OK to have this flag set if anything below fails?
>>>>>>
>>>>> Yes - if anything below fails it will only affect userspace mapped page.
>>>> Then should it be set somewhere else, like in xen_time_init()?
>>>>
>>> Hm, I could move it if you think it's better - but given the importance of the
>>> bit we are checking and its direct correlation to whether or not we can setup
>>> VCLOCK_PVCLOCK then I find it cleaner to have it here in the same routine. One
>>> thing I failed to mention before is that checking ahead like above, let us also
>>> avoid allocating a page plus an hypercall to register the pvti just to check the
>>> one bit of info we need for using VCLOCK_PVCLOCK.
>>>
>>> It is very unlikely with current Xen code that 1) the secondary copy register
>>> below fails, or 2) master and secondary don't have the same bits set. So in case
>>> you're reconsidering the "shortcut" check above I can move it like we had in v1
>>> and have pvclock_set_flags right before pvclock_set_pvti_cpu0_va().
>>
>> I think it would be more logical to move it to the end like in v1.
>>
>> But can you explain again why this flag should not be set in
>> xen_time_init()?
>
> I didn't say we shouldn't have this flag there - I was just pointing out a
> matter of taste on whether to put on xen_time_init() or in
> xen_setup_vsyscall_time_info() (which is called from xen_time_init btw) so
> there's no functional change.
>
To be clear, in this paragraph when I say on xen_setup_vsyscall_time_info() I
mean like it is described in this patch i.e. in the beginning of the routine.

>> It seems to me that it would be useful not just for
>> vDSO but for xen_clocksource_read()->pvclock_clocksource_read() as well.
>
> Right - That's what I mentioned by "allowing xen clocksource to use/check that
> bit (consequently speeding up sched_clock)". The above chunk is really focused
> on enabling the flag on pvclock_clocksource_read().
>
>>>
>>>>> What I
>>>>> do above is just allowing xen clocksource to use/check that bit (consequently
>>>>> speeding up sched_clock) given the necessary support is there in the master
>>>>> copy. The secondary copy (i.e. what's being set up below, mapped/used in vdso)
>>>>> has the same data from the master copy, just separate memory regions. The checks
>>>>> below are just for the unlikely cases of failing to register the secondary copy
>>>>> or if its content were to differ from master copy in future releases - and
>>>>> therefore we handle those more gracefully.
>>>>>
>>>>>> (I can see in the changelog that apparently at some point I've asked
>>>>>> about this at v1 but I can't remember/find what exactly it was)
>>>>>>
>>>>>>> +
>>>>>>> + ti = (struct pvclock_vsyscall_time_info *)get_zeroed_page(GFP_KERNEL);
>>>>>>> + if (!ti)
>>>>>>> + return;
>>>>>>> +
>>>>>>> + t.addr.v = &ti->pvti;
>>>>>>> +
>>>>>>> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
>>>>>>> + if (ret) {
>>>>>>> + pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
>>>>>>> + free_page((unsigned long)ti);
>>>>>>> + return;
>>>>>>> + }
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * If the check above succedded this one should too since it's the
>>>>>>> + * same data on both primary and secondary time infos just different
>>>>>>> + * memory regions. But we still check it in case hypervisor is buggy.
>>>>>>> + */
>>>>>>> + pvti = &ti->pvti;
>>>>>>> + if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) {
>>>>>>> + t.addr.v = NULL;
>>>>>>> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
>>>>>>> + 0, &t);
>>>>>>> + if (!ret)
>>>>>>> + free_page((unsigned long)ti);
>>>>>>> +
>>>>>>> + pr_notice("xen: VCLOCK_PVCLOCK not supported (tsc unstable)\n");
>>>>>>> + return;
>>>>>>> + }
>>>>>>> +
>>>>>>> + xen_clock = ti;
>>>>>>> + pvclock_set_pvti_cpu0_va(xen_clock);
>>>>>>> +
>>>>>>> + xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
>>>>>>> +}
>>>>>>> +
>>