virtio ring layout changes for optimal single-stream performance
From: Michael S. Tsirkin
Date: Thu Jan 21 2016 - 08:39:37 EST
I have been experimenting with alternative virtio ring layouts,
in order to speed up single stream performance.
I have just posted a benchmark I wrote for the purpose, and a (partial)
alternative layout implementation. This achieves 20-40% reduction in
virtio overhead in the (default) polling mode.
The layout is trying to be as simple as possible, to reduce
the number of cache lines bouncing between CPUs.
For benchmarking, the idea is to emulate virtio in user-space,
artificially adding overhead for e.g. signalling to match what happens
in case of a VM.
I'd be very curious to get feedback on this, in particular, some people
discussed using vectored operations to format virtio ring - would it
conflict with this work?
You are all welcome to post enhancements or more layout alternatives as
- documentation+discussion of interaction with CPU caching
- thorough benchmarking of different configurations/hosts
- experiment with event index replacements
- better emulate vmexit/vmentry cost overhead
- virtio spec proposal