Re: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
From: Thomas Gleixner
Date: Wed Nov 01 2017 - 10:51:23 EST
On Tue, 31 Oct 2017, mikelley@xxxxxxxxxxxxxxxxxxxxxx wrote:
> diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
> index f65d125..408cf3e 100644
> --- a/arch/x86/include/uapi/asm/hyperv.h
> +++ b/arch/x86/include/uapi/asm/hyperv.h
> @@ -112,6 +112,22 @@
> #define HV_X64_GUEST_IDLE_STATE_AVAILABLE (1 << 5)
> /* Guest crash data handler available */
> #define HV_X64_GUEST_CRASH_MSR_AVAILABLE (1 << 10)
> +/* Debug MSRs available */
> +#define HV_X64_DEBUG_MSR_AVAILABLE (1 << 11)
> +/* Support for Non-Privileged Instruction Execution Prevention is available */
> +#define HV_X64_NPIEP_AVAILABLE (1 << 12)
> +/* Support for DisableHypervisor is available */
> +#define HV_X64_DISABLE_HYPERVISOR_AVAILABLE (1 << 13)
> +/* Extended GVA Ranges for Flush Virtual Address list is available */
> +#define HV_X64_EXTENDED_GVA_RANGE_AVAILABLE (1 << 14)
> +/* Return Hypercall output via XMM registers is available */
> +#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE (1 << 15)
> +/* SINT polling mode available */
> +#define HV_X64_SINT_POLLING_MODE_AVAILABLE (1 << 17)
> +/* Hypercall MSR lock is available */
> +#define HV_X64_HYPERCALL_MSR_LOCK_AVAILABLE (1 << 18)
> +/* stimer direct mode is available */
> +#define HV_X64_STIMER_DIRECT_MODE_AVAILABLE (1 << 19)
How are all these defines (except the last one related to that patch?
> +/* Hardware IRQ number to use for stimer0 in Direct Mode. This IRQ is a fake
> + * because stimer's in Direct Mode simply interrupt on the specified vector,
> + * without using a particular IOAPIC pin. But we use the IRQ allocation
> + * machinery, so we need a hardware IRQ #. This value is somewhat arbitrary,
> + * but it should not be a legacy IRQ (0 to 15), and should fit within the
> + * single IOAPIC (0 to 23) that Hyper-V provides to a guest VM. So any value
> + * between 16 and 23 should be good.
> + */
> +#define HV_STIMER0_IRQNR 18
Why would you want abuse an IOAPIC interrupt if all you need is a vector?
> +/* Routines to do per-architecture handling of stimer0 when in Direct Mode */
> +
> +void hv_ack_stimer0_interrupt(struct irq_desc *desc)
> +{
> + ack_APIC_irq();
> +}
> +
> +static void allonline_vector_allocation_domain(int cpu, struct cpumask *retmask,
> + const struct cpumask *mask)
> +{
> + cpumask_copy(retmask, cpu_online_mask);
> +}
> +
> +int hv_allocate_stimer0_irq(int *irq, int *vector)
> +{
> + int localirq;
> + int result;
> + struct irq_data *irq_data;
> +
> + /* The normal APIC vector allocation domain allows allocation of vectors
Please fix you comment style. Multi line comments are:
/*
* Bla....
* foo...
*/
> + * only for the calling CPU. So we change the allocation domain to one
> + * that allows vectors to be allocated in all online CPUs. This
> + * change is fine in a Hyper-V VM because VMs don't have the usual
> + * complement of interrupting devices.
> + */
> + apic->vector_allocation_domain = allonline_vector_allocation_domain;
This does not work anymore. vector_allocation_domain is gone as of
4.15. Please check the vector rework in
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip x86/apic
Aside of that what guarantees that all cpus are online at the point where
you allocate that interrupt? Nothing, so the vector will be not reserved or
allocated on offline CPUs. Now guess what happens if you bring the offline
CPUs online later, it will drown in spurious interrupts. Worse, the vector
can also be reused for a device interrupt. Great plan.
> + localirq = acpi_register_gsi(NULL, HV_STIMER0_IRQNR,
> + ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
> + if (localirq < 0) {
> + pr_err("Cannot register stimer0 gsi. Error %d", localirq);
> + return -1;
> + }
> +
> + /* We pass in a dummy IRQ handler because architecture independent code
> + * will later override the IRQ domain interrupt handler and set it to a
> + * Hyper-V specific handler.
> + */
> + result = request_irq(localirq, (irq_handler_t)(-1), 0,
"Hyper-V stimer0", NULL);
That's a crude hack. Really.
> + if (result) {
> + pr_err("Cannot request stimer0 irq. Error %d", result);
> + acpi_unregister_gsi(localirq);
> + return -1;
> + }
> + irq_data = irq_domain_get_irq_data(x86_vector_domain, localirq);
> + *vector = irqd_cfg(irq_data)->vector;
> + *irq = localirq;
Uurgh, no. This is even more of a layering violation. Grab random data from
wherever it comes and then expect that it works. This will simply fall
apart the moment someone changes the affinity of this interrupt. It will
move to some random other vector and the system drowns in spurious
interrupts on the old vector.
> +/* ISR for when stimer0 is operating in Direct Mode. Direct Mode does
> + * not use VMBus or any VMBus messages, so process here and not in the
> + * VMBus driver code.
> + */
> +
> +static void hv_stimer0_isr(struct irq_desc *desc)
> +{
> + struct hv_per_cpu_context *hv_cpu;
> +
> + __this_cpu_inc(*desc->kstat_irqs);
> + __this_cpu_inc(kstat.irqs_sum);
> + hv_ack_stimer0_interrupt(desc);
> + hv_cpu = this_cpu_ptr(hv_context.cpu_context);
> + hv_cpu->clk_evt->event_handler(hv_cpu->clk_evt);
> + add_interrupt_randomness(desc->irq_data.irq, 0);
> +}
> +
> static int hv_ce_set_next_event(unsigned long delta,
> struct clock_event_device *evt)
> {
> @@ -108,6 +149,8 @@ static int hv_ce_shutdown(struct clock_event_device *evt)
> {
> hv_init_timer(HV_X64_MSR_STIMER0_COUNT, 0);
> hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG, 0);
> + if (stimer_direct_mode)
> + hv_disable_stimer0_percpu_irq(stimer0_irq);
Whats the point of that? It's an empty inline:
> +#if IS_ENABLED(CONFIG_HYPERV)
> +static inline void hv_enable_stimer0_percpu_irq(int irq) { }
> +static inline void hv_disable_stimer0_percpu_irq(int irq) { }
> + if (stimer_direct_mode) {
> +
> + /* When it expires, the timer will directly interrupt
> + * on the specific hardware vector.
> + */
> + timer_cfg.direct_mode = 1;
> + timer_cfg.apic_vector = stimer0_vector;
> + hv_enable_stimer0_percpu_irq(stimer0_irq);
Ditto.
> + if (stimer_direct_mode) {
> + if (hv_allocate_stimer0_irq(&stimer0_irq, &stimer0_vector))
> + goto err;
> + irq_set_handler(stimer0_irq, hv_stimer0_isr);
What you really want to do here is to allocate a fixed vector like we do
for the other percpu interrupts (local apic timer, IPIs etc.) and use
that. This must be done at boot time, when you detect that the kernel runs
on HyperV. This makes sure, that all CPUs will have the vector populated,
even those which come online late.
Though you can't do that from a module. You have to do that setup during
early boot from ms_hyperv_init_platform(). That's the only sane way to deal
with that.
Thanks,
tglx