Anthony Liguori wrote:
writel(dst_x_reg, x);
writel(dst_y_reg, y)
writel(width_reg, w);
writel(height_reg, h);
writel(blt_cmd_reg, fill);
then kvm would cache the first four in a mmap()able memory area and only exit to userspace on the fifth. Userspace would then read the cached registers from memory and emulate the command.
Letting QEMU do a certain amount of emulation after every transition would the problem in a more elegant and generic way.
But what amount? A basic block, or several?
Emulation has its costs. You need to marshal the registers to and fro. You need to reset qemu's cached translations. You need to throw away shadow page tables and qemu's softmmu. You increase the time spent in single threaded code.