Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommittedscenario in PLE handler

From: Avi Kivity
Date: Mon Sep 24 2012 - 11:58:39 EST


On 09/24/2012 05:52 PM, Peter Zijlstra wrote:
> On Mon, 2012-09-24 at 17:43 +0200, Avi Kivity wrote:
>> Wouldn't this correspond to the scheduler interrupt firing and causing a
>> reschedule? I thought the timer was programmed for exactly the point in
>> time that CFS considers the right time for a switch. But I'm basing
>> this on my mental model of CFS, not CFS itself.
>
> No, we tried this for hrtimer kernels for a while, but programming
> hrtimers the whole time (every actual task-switch) turns out to be far
> too expensive. So we're back to HZ ticks and 'polling' the preemption
> state.

Ok, so I wasn't completely off base.

With HZ=1000, we can only be faster than the poll by a millisecond than
the interrupt-driven schedule(), and we need to be a lot faster.

> Even if we remove all the hrtimer infrastructure overhead (can do with a
> few hacks) setting the hardware requires going out to the LAPIC, which
> is stupid slow.
>
> Some hardware actually has fast/reliable/usable timers, sadly none of it
> is popular.

There is the TSC deadline timer mode of newer Intels. Programming the
timer is a simple wrmsr, and it will fire immediately if it already
expired. Unfortunately on AMDs it is not available, and on virtual
hardware it will be slow (~1-2 usec).

--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/