On Wed, Feb 09, 2022, Paolo Bonzini wrote:
The TDP MMU has a performance regression compared to the legacy MMU
when CR0 changes often. This was reported for the grsecurity kernel,
which uses CR0.WP to implement kernel W^X. In that case, each change to
CR0.WP unloads the MMU and causes a lot of unnecessary work. When running
nested, this can even cause the L1 to hardly make progress, as the L0
hypervisor it is overwhelmed by the amount of MMU work that is needed.
FWIW, my flushing/zapping series fixes this by doing the teardown in an async
worker. There's even a selftest for this exact case :-)
https://lore.kernel.org/all/20211223222318.1039223-1-seanjc@xxxxxxxxxx