Re: [PATCH v3 1/2] s390/kvm: extend kvm_s390_shadow_fault to return entry pointer

From: David Hildenbrand
Date: Thu Feb 11 2021 - 08:38:43 EST


On 11.02.21 13:57, Claudio Imbrenda wrote:
On Thu, 11 Feb 2021 10:18:56 +0100
David Hildenbrand <david@xxxxxxxxxx> wrote:

On 09.02.21 16:43, Claudio Imbrenda wrote:
Extend kvm_s390_shadow_fault to return the pointer to the valid leaf
DAT table entry, or to the invalid entry.

Also return some flags in the lower bits of the address:
DAT_PROT: indicates that DAT protection applies because of the
protection bit in the segment (or, if EDAT, region)
tables NOT_PTE: indicates that the address of the DAT table entry
returned does not refer to a PTE, but to a segment or region table.

I've been thinking about one possible issue, but I think it's not
actually an issue. Just sharing so others can verify:

In case our guest uses huge pages / gigantic pages / ASCE R, we
create fake page tables (GMAP_SHADOW_FAKE_TABLE).

So, it could be that kvm_s390_shadow_fault()->gmap_shadow_pgt_lookup()
succeeds, however, we have a fake PTE in our hands. We lost the
actual guest STE/RTE address. (I think it could be recovered somehow
via page->index, thought)

But I guess, if there is a fake PTE, then there is not acutally
something that could go wrong in gmap_shadow_page() anymore that could
lead us in responding something wrong to the guest. We can only really
fail with -EINVAL, -ENOMEM or -EFAULT.

this was also my reasoning

So if the guest changed anything in the meantime (e.g., zap a
segment), we would have unshadowed the whole fake page table
hierarchy and would simply retry.

Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
---
arch/s390/kvm/gaccess.c | 30 +++++++++++++++++++++++++-----
arch/s390/kvm/gaccess.h | 5 ++++-
arch/s390/kvm/vsie.c | 8 ++++----
3 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 6d6b57059493..e0ab83f051d2 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -976,7 +976,9 @@ int kvm_s390_check_low_addr_prot_real(struct
kvm_vcpu *vcpu, unsigned long gra)
* kvm_s390_shadow_tables - walk the guest page table and create
shadow tables
* @sg: pointer to the shadow guest address space structure
* @saddr: faulting address in the shadow gmap
- * @pgt: pointer to the page table address result
+ * @pgt: pointer to the beginning of the page table for the given
address if
+ * successful (return value 0), or to the first invalid DAT
entry in
+ * case of exceptions (return value > 0)
* @fake: pgt references contiguous guest memory block, not a
pgtable */
static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long
saddr, @@ -1034,6 +1036,7 @@ static int
kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rfte.val = ptr; goto shadow_r2t;
}
+ *pgt = ptr + vaddr.rfx * 8;
rc = gmap_read_table(parent, ptr + vaddr.rfx * 8,
&rfte.val);

Using

gmap_read_table(parent, *pgt, &rfte.val);

or similar with a local variable might then be even clearer. But no
strong opinion.

that's also something I had thought about, in the end this minimizes
the number of lines / variables while still being readable

if (rc)
return rc;
@@ -1060,6 +1063,7 @@ static int kvm_s390_shadow_tables(struct gmap
*sg, unsigned long saddr, rste.val = ptr;
goto shadow_r3t;
}
+ *pgt = ptr + vaddr.rsx * 8;
rc = gmap_read_table(parent, ptr + vaddr.rsx * 8,
&rste.val); if (rc)
return rc;
@@ -1087,6 +1091,7 @@ static int kvm_s390_shadow_tables(struct gmap
*sg, unsigned long saddr, rtte.val = ptr;
goto shadow_sgt;
}
+ *pgt = ptr + vaddr.rtx * 8;
rc = gmap_read_table(parent, ptr + vaddr.rtx * 8,
&rtte.val); if (rc)
return rc;
@@ -1123,6 +1128,7 @@ static int kvm_s390_shadow_tables(struct gmap
*sg, unsigned long saddr, ste.val = ptr;
goto shadow_pgt;
}
+ *pgt = ptr + vaddr.sx * 8;
rc = gmap_read_table(parent, ptr + vaddr.sx * 8,
&ste.val); if (rc)
return rc;
@@ -1157,6 +1163,8 @@ static int kvm_s390_shadow_tables(struct gmap
*sg, unsigned long saddr,
* @vcpu: virtual cpu
* @sg: pointer to the shadow guest address space structure
* @saddr: faulting address in the shadow gmap
+ * @datptr: will contain the address of the faulting DAT table
entry, or of
+ * the valid leaf, plus some flags
*
* Returns: - 0 if the shadow fault was successfully resolved
* - > 0 (pgm exception code) on exceptions while
faulting @@ -1165,11 +1173,11 @@ static int
kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
* - -ENOMEM if out of memory
*/
int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
- unsigned long saddr)
+ unsigned long saddr, unsigned long
*datptr) {
union vaddress vaddr;
union page_table_entry pte;
- unsigned long pgt;
+ unsigned long pgt = 0;
int dat_protection, fake;
int rc;
@@ -1191,8 +1199,20 @@ int kvm_s390_shadow_fault(struct kvm_vcpu
*vcpu, struct gmap *sg, pte.val = pgt + vaddr.px * PAGE_SIZE;
goto shadow_page;
}
- if (!rc)
- rc = gmap_read_table(sg->parent, pgt + vaddr.px *
8, &pte.val); +
+ switch (rc) {
+ case PGM_SEGMENT_TRANSLATION:
+ case PGM_REGION_THIRD_TRANS:
+ case PGM_REGION_SECOND_TRANS:
+ case PGM_REGION_FIRST_TRANS:
+ pgt |= NOT_PTE;
+ break;
+ case 0:
+ pgt += vaddr.px * 8;
+ rc = gmap_read_table(sg->parent, pgt, &pte.val);
+ }
+ if (*datptr)
+ *datptr = pgt | dat_protection * DAT_PROT;
if (!rc && pte.i)
rc = PGM_PAGE_TRANSLATION;
if (!rc && pte.z)
diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
index f4c51756c462..fec26bbb17ba 100644
--- a/arch/s390/kvm/gaccess.h
+++ b/arch/s390/kvm/gaccess.h
@@ -359,7 +359,10 @@ void ipte_unlock(struct kvm_vcpu *vcpu);
int ipte_lock_held(struct kvm_vcpu *vcpu);
int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu,
unsigned long gra);
+#define DAT_PROT 2
+#define NOT_PTE 4

What if our guest is using ASCE.R ?

then we don't care.

if the guest is using ASCE.R, then shadowing will always succeed, and
the VSIE MVPG handler will retry right away.

if you are worried about the the lowest order bit, it can only be set
if a specific feature is enabled in the host, and KVM doesn't use /
support it, so the guest can't use it for its guest.

Got it, thanks! :)

Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>


--
Thanks,

David / dhildenb