Re: [PATCH] MMIO should have more priority then IO

From: Nadav Amit
Date: Fri Jul 08 2022 - 12:45:08 EST

On Jul 8, 2022, at 5:56 AM, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:

> And looking at the results above, it's not so much the PIO vs MMIO
> that makes a difference, it's the virtualisation. A mmio access goes
> from 269ns to 85us. Rather than messing around with preferring MMIO
> over PIO for config space, having an "enlightenment" to do config
> space accesses would be a more profitable path.

I am unfamiliar with the motivation for this patch, but I just wanted to
briefly regard the advice about enlightments.

“enlightenment”, AFAIK, is Microsoft’s term for "para-virtualization", so
let’s regard the generic term. I think that you consider the bare-metal
results as the possible results from a paravirtual machine, which is mostly
wrong. Para-virtualization usually still requires a VM-exit and for the most
part the hypervisor/host runs similar code for MMIO/hypercall (conceptually;
the code of paravirtual and fully-virtual devices is often different, but
IIUC, this is not what Ajay measured).

Para-virtualization could have *perhaps* helped to reduce the number of
PIO/MMIO and improve performance this way. If, for instance, all the
PIO/MMIO are done during initialization, a paravirtual interface can be use
to batch them together, and that would help. But it is more complicated to
get a performance benefit from paravirtualization if the PIO/MMIO accesses
are “spread”, for instance, done after each interrupt.

Para-virtauilzation and full-virtualization both have pros and cons.
Para-virtualization is many times more efficient, but requires the VM to
have dedicated device drivers for the matter. Try to run a less-common OS
than Linux and it would not work since the OS would not have drivers for the
paras-virtual devices. And even if you add support today for a para-virtual
devices, there are many deployed OSes that do not have such support, and you
would not be able to run them in a VM.

Regardless to virtualization, Ajay’s results show PIO is slower on
bare-metal, and according to his numbers by 165ns, which is significant.
Emulating PIO in hypervisors on x86 is inherently more complex than MMIO, so
the results he got would most likely happen on all hypervisors.

tl;dr: Let’s keep this discussion focused and put paravirtualization aside.
It is not a solution for all the problems in the world.