Re: [RFC PATCH 0/3] Weight-balanced binary tree + KVM growable memoryslots using wbtree

From: Avi Kivity
Date: Wed Mar 02 2011 - 08:31:44 EST

Next message: Andrea Righi: "Re: [PATCH 2/3] blkio-throttle: infrastructure to throttle async io"
Previous message: Wolfram Sang: "Re: [PATCH] i2c: Adding mangling capability to i2c imx buscontroller."
In reply to: Alex Williamson: "Re: [RFC PATCH 0/3] Weight-balanced binary tree + KVM growablememory slots using wbtree"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 03/01/2011 08:20 PM, Alex Williamson wrote:

> > It seems like we need a good mixed workload benchmark. So far we've
> > only tested worst case, with a pure emulated I/O test, and best case,
> > with a pure memory test. Ordering an array only helps the latter, and
> > only barely beats the tree, so I suspect overall performance would be
> > better with a tree.
>
> But if we cache the missed-all-memslots result in the spte, we eliminate
> the worst case, and are left with just the best case.

There's potentially a lot of entries between best case and worst case.

The mid case is where we have a lot of small slots which are continuously flushed. That would be (ept=0 && new mappings continuously established) || (lots of small mappings && lots of host paging activity). I don't know of any guests that continuously reestablish BAR mappings; and host paging activity doesn't apply to device assignment. What are we left with?

>
> The problem here is that all workloads will cache all memslots very
> quickly into sptes and all lookups will be misses. There are two cases
> where we have lookups that hit the memslots structure: ept=0, and host
> swap. Neither are things we want to optimize too heavily.

Which seems to suggest that:

A. making those misses fast = win
B. making those misses fast + caching misses = win++
C. we don't care if the sorted array is subtly faster for ept=0

Sound right? So is the question whether cached misses alone gets us 99%
of the improvement since hits are already getting cached in sptes for
cases we care about?

Yes, that's my feeling. Caching those misses is a lot more important than speeding them up, since the cache will stay valid for long periods, and since the hit rate will be very high.

Cache+anything=O(1)
no-cache+tree=O(log(n))
no-cache+array=O(n)

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrea Righi: "Re: [PATCH 2/3] blkio-throttle: infrastructure to throttle async io"
Previous message: Wolfram Sang: "Re: [PATCH] i2c: Adding mangling capability to i2c imx buscontroller."
In reply to: Alex Williamson: "Re: [RFC PATCH 0/3] Weight-balanced binary tree + KVM growablememory slots using wbtree"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]