> > It seems like we need a good mixed workload benchmark. So far we've
> > only tested worst case, with a pure emulated I/O test, and best case,
> > with a pure memory test. Ordering an array only helps the latter, and
> > only barely beats the tree, so I suspect overall performance would be
> > better with a tree.
>
> But if we cache the missed-all-memslots result in the spte, we eliminate
> the worst case, and are left with just the best case.
There's potentially a lot of entries between best case and worst case.
>
> The problem here is that all workloads will cache all memslots very
> quickly into sptes and all lookups will be misses. There are two cases
> where we have lookups that hit the memslots structure: ept=0, and host
> swap. Neither are things we want to optimize too heavily.
Which seems to suggest that:
A. making those misses fast = win
B. making those misses fast + caching misses = win++
C. we don't care if the sorted array is subtly faster for ept=0
Sound right? So is the question whether cached misses alone gets us 99%
of the improvement since hits are already getting cached in sptes for
cases we care about?