[PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
From: Leonardo Bras
Date: Tue Dec 17 2019 - 16:07:37 EST
Fixes a bug that happens when a virtual machine is created without DDW,
with vhost supporting a virtio-net device.
In this scenario, an IOMMU with 32-bit DMA window will possibly map
IOVA's to different memory addresses.
As the code works today, H_STUFF_TCE hypercall will be dealt only with
kvm code, which does not invalidate the IOTLB entry in vhost, meaning
that at some point, and old entry can cause an access to a previous
memory address that IOVA pointed.
Example:
- virtio-net passes IOVA N to vhost, which point to M1
- vhost tries IOTLB, but miss
- vhost translates IOVA N and stores result to IOTLB
- vhost writes to M1
- (some IOMMU usage)
- virtio-net passes IOVA N to vhost, which now points to M2
- vhost tries IOTLB, and translates IOVA N to M1
- vhost writes to M1 <error, should write to M2>
The reason why this error was not so evident, is probably because the
IOTLB was small enough to almost always miss at the point an IOVA was
reused. Raising the IOTLB size to 32k (which is a module parameter that
defaults to 2k) is enough to reproduce the bug in +90% of the runs.
It usually takes less than 10 seconds of netperf to cause this bug
to happen.
A few minutes after reproducing this bug, the guest usually crash.
Fixing this bug involves cleaning a IOVA entry from IOTLB.
The guest kernel trigger this by doing a H_STUFF_TCE hypercall with
tce_value == 0.
This change fixes this bug by returning H_TOO_HARD on kvmppc_h_stuff_tce
when tce_value == 0, which causes kvm to let qemu deal with this.
In this case, qemu does free the vhost IOTLB entry, which fixes the bug.
Signed-off-by: Leonardo Bras <leonardo@xxxxxxxxxxxxx>
---
arch/powerpc/kvm/book3s_64_vio.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 883a66e76638..841eff3f6392 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -710,6 +710,9 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
if (ret != H_SUCCESS)
return ret;
+ if (tce_value == 0)
+ return H_TOO_HARD;
+
/* Check permission bits only to allow userspace poison TCE for debug */
if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
return H_PARAMETER;
--
2.23.0