Re: [Xen-devel] NUMA_BALANCING and Xen PV guest regression in 3.20-rc0

From: Andrew Cooper
Date: Fri Feb 20 2015 - 06:55:01 EST


On 20/02/15 11:29, Kirill A. Shutemov wrote:
> On Fri, Feb 20, 2015 at 10:47:52AM +0000, Andrew Cooper wrote:
>> On 20/02/15 01:49, Linus Torvalds wrote:
>>> On Thu, Feb 19, 2015 at 5:05 PM, Kirill A. Shutemov
>>> <kirill@xxxxxxxxxxxxx> wrote:
>>>> I'm feeling I miss very basic background on how Xen works, but why does it
>>>> set _PAGE_GLOBAL on userspace entries? It sounds strange to me.
>>> It is definitely strange. I'm guessing that it's some ancient Xen hack
>>> for the early Intel virtualization that used to have absolutely
>>> horrendous vmenter/exit costs, including very much the TLB overhead. \
>>>
>>> These days, Intel has address space identifiers, and doesn't flush the
>>> whole TLB on VM entry/exit, so it's probably pointless to play games
>>> with the global bit.
>> It was introduced in 2006, but has nothing to do with VT-x
>>
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=6f562e72cdc4b7e1519e23be75f812aebbf41db3
>>
>> As long mode drops segment limit checking, the only way to protect a
>> 64bit PV kernel from its userspace (both of which run in ring3 on user
>> pages) is to maintain two sets of pagetables and switch between them on
>> guest kernel/user context switches. The user set lack kernel mappings.
>>
>> I can't comment about the performance impact of the patch (way before my
>> time), but the justification was to try and reduce the overhead of guest
>> context switches.
> IIUC, it tries to reduce userspace->kernel switch in guest. It's still
> hopeless: kernel mappings are always TLB-cold, right?

There is no way to avoid the kernel mappings being TLB-cold on a guest
user -> kernel context switch.

A lot of the "legacy 32bit stuff" which was dropped in AMD64 were
exactly the bits Xen was using to efficiently make PV guests safe.

>
>>> I get the feeling that a lot of Xen stuff is that kind of "legacy
>>> hacks" that should just be cleaned up, but nobody has the energy or
>>> the interest.
>> Time, mainly.
>>
>> There certainly are areas which should be up for re-evaluation, given 9
>> years of change in hardware.
> Is Xen PV still widely used?

Dom0 realistically still needs to be PV.

PVH (hardware extensions but no qemu emulating a motherboard) is on the
horizon but still very much experimental with open issues needing to be
solved.

> I'm surprised that users can tolerate this kind of overhead.

For modern hardware, most workloads are now "better" in HVM guests, but
even only 5 years ago, the vmentry/vmexit overhead tended to outweigh
the PV overheads.

On the other hand, unikernel virtual machines can always be more
efficient as PV guests, so PV is not going to die any time soon.

~Andrew

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/