Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

From: Dan Hecht
Date: Wed Mar 07 2007 - 16:38:27 EST


On 03/07/2007 01:21 PM, Thomas Gleixner wrote:
On Wed, 2007-03-07 at 11:49 -0800, Dan Hecht wrote:
Jeremy, I saw you sent out the Xen version earlier, thanks. Here's ours for reference (please excuse any formating issues); it's also lean. We'll send out a proper patch later after some more testing:

Ah. Bitching loud enough speeds things up. :)


We've always planned to do this. We just didn't want to create the dependency between paravirt_ops and clockevents too early such that they would depend on each other to merge to main line. Now that they are both there, we are all for it.

/** vmi clockevent */

static struct clock_event_device vmi_global_clockevent;

static inline u32 vmi_alarm_wiring(struct clock_event_device *evt)
{
return (evt == &vmi_global_clockevent) ?
VMI_ALARM_WIRED_IRQ0 : VMI_ALARM_WIRED_LVTT;
}

static void vmi_timer_set_mode(enum clock_event_mode mode,
struct clock_event_device *evt)
{
u32 wiring;
cycle_t now, cycles_per_hz;
BUG_ON(!irqs_disabled());

wiring = vmi_alarm_wiring(evt);
if (wiring == VMI_ALARM_WIRED_LVTT)
/* Route the interrupt to the correct vector */
apic_write_around(APIC_LVTT, LOCAL_TIMER_VECTOR);

Wire that in the hypervisor.

switch (mode) {
case CLOCK_EVT_MODE_ONESHOT:
break;
case CLOCK_EVT_MODE_PERIODIC:
cycles_per_hz = vmi_timer_ops.get_cycle_frequency();
(void)do_div(cycles_per_hz, HZ);
now = vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_PERIODIC));
vmi_timer_ops.set_alarm(wiring | VMI_PERIODIC,
now, cycles_per_hz);

paravirt_ops->paravirt_clockevent->set_periodic(vcpu, period);


Huh? paravirt_ops isn't a hypervisor interface, it's just a linux code abstraction. The code on both sides of paravirt_ops is *linux* code, any way you cut it. clockevents is already a linux code abstraction. why introduce the redundancy?


break;
case CLOCK_EVT_MODE_UNUSED:
case CLOCK_EVT_MODE_SHUTDOWN:

paravirt_ops->paravirt_clockevent->stop_event(vcpu, mode);


You would be introducing the same redundancy.


switch (evt->mode) {
case CLOCK_EVT_MODE_ONESHOT:
vmi_timer_ops.cancel_alarm(VMI_ONESHOT);
break;
case CLOCK_EVT_MODE_PERIODIC:
vmi_timer_ops.cancel_alarm(VMI_PERIODIC);
break;
default:
break;
}
break;
default:
break;
}
}

This whole vmi_timer_ops thing is horrible. All hypervisors can share paravirt_ops->paravirt_clockevent and retrieve the methods on boot.


vmi_timer_ops.whatever is where the kernel <-> hypervisor boundary is crossed for VMI.

static int vmi_timer_next_event(unsigned long delta,
struct clock_event_device *evt)
{
/* Unfortunately, set_next_event interface only passes relative
* expiry, but we want absolute expiry. It'd be better if were
* were passed an aboslute expiry, since a bunch of time may
* have been stolen between the time the delta is computed and
* when we set the alarm below. */
cycle_t now = vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_ONESHOT));

BUG_ON(evt->mode != CLOCK_EVT_MODE_ONESHOT);
vmi_timer_ops.set_alarm(vmi_alarm_wiring(evt) | VMI_ONESHOT,
now + delta, 0);
return 0;
}

Great. Now we have:

s64 event = startup_offset + ktime_to_ns(evt->next_event);

if (HYPERVISOR_set_timer_op(event) < 0)
BUG();
and

vmi_timer_ops.set_alarm(vmi_alarm_wiring(evt) | VMI_ONESHOT, now + delta, 0);

How will the next implementations look like ?

lguest_program_timer(delta + lguest_current_time(), LGUEST_TIMER_SHOOT_ONCE);

virt_nextgen_ops.set_timer_event(delta, NO_WE_NEED_NO_FLAGS);

.......

This is tinkering of the best. My understanding of the paravirt
discussion at Kernel Summit was, that paravirt ops are exactly there to
prevent the above random hackery in the kernel and to allow _ALL_
hypervisors to interact via a sane interface inside of the kernel.


No, that was not the point of paravirt_ops. It is actually the complete opposite of the intention of paravirt_ops. paravirt_ops' intent is exactly to allow for *multiple* hypervisor ABIs to exist in the kernel.

At kernel summit, paravirt_ops was proposed to allow for multiple hypervisor ABI's to be targeted by the kernel. The code on both sides of paravirt_ops is *linux* code.

You are just perverting the whole idea of a standartized
paravirtualization interface.

This things can be done for clocksources, clockevents, interrupts (the
generic irq code allows this) and probaly for a whole bunch of other
stuff.

The current paravirt interface is completely insane and will explode
into an unmaintainable nightmare within no time, if we keep accepting
that crap further.

No thanks.


Again, you are misunderstanding the intent of paravirt_ops and history behind it's development.


#ifdef CONFIG_X86_LOCAL_APIC

/* Replacement for lapic timer local clock event.
* paravirt_ops.setup_boot_clock = vmi_nop
* (continue using global_clock_event on cpu0)
* paravirt_ops.setup_secondary_clock = vmi_timer_setup_local_alarm
*/
void __devinit vmi_timer_setup_local_alarm(void)
{
struct clock_event_device *evt = &__get_cpu_var(local_clock_events);

/* Then, start it back up as a local clockevent device. */
memcpy(evt, &vmi_clockevent, sizeof(*evt));
evt->cpumask = cpumask_of_cpu(smp_processor_id());

printk(KERN_WARNING "vmi: registering clock event %s. mult=%lu shift=%u\n",
evt->name, evt->mult, evt->shift);
clockevents_register_device(evt);
}

#endif

Why the hell do you need an lapic emulator here? This is exactly the
kind of crap, we do not want to have. clockevents do not care which
piece of hardware is calling them and we do not care how a particular
hypervisor is wiring that hardware.


Again, I said in a previous mail that we am fine with introducing our own interrupt handler rather than using the lapic one.


Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/