Re: [PATCH v1 1/2] s390x: KVM: accept STSI for CPU topology information

From: Pierre Morel
Date: Thu Jul 15 2021 - 07:31:12 EST




On 7/15/21 10:51 AM, David Hildenbrand wrote:
On 14.07.21 17:25, Pierre Morel wrote:
STSI(15.1.x) gives information on the CPU configuration topology.
Let's accept the interception of STSI with the function code 15 and
let the userland part of the hypervisor handle it.

Signed-off-by: Pierre Morel <pmorel@xxxxxxxxxxxxx>
---
  arch/s390/kvm/priv.c | 11 ++++++++++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 9928f785c677..4ab5f8b7780e 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -856,7 +856,7 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
      if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
          return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
-    if (fc > 3) {
+    if (fc > 3 && fc != 15) {
          kvm_s390_set_psw_cc(vcpu, 3);
          return 0;
      }
@@ -893,6 +893,15 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
              goto out_no_data;
          handle_stsi_3_2_2(vcpu, (void *) mem);
          break;
+    case 15:
+        if (sel1 != 1 || sel2 < 2 || sel2 > 6)
+            goto out_no_data;
+        if (vcpu->kvm->arch.user_stsi) {
+            insert_stsi_usr_data(vcpu, operand2, ar, fc, sel1, sel2);
+            return -EREMOTE;
+        }
+        kvm_s390_set_psw_cc(vcpu, 3);
+        return 0;
      }
      if (kvm_s390_pv_cpu_is_protected(vcpu)) {
          memcpy((void *)sida_origin(vcpu->arch.sie_block), (void *)mem,


1. Setting GPRS to 0

I was wondering why we have the "vcpu->run->s.regs.gprs[0] = 0;"
for existing fc 1,2,3 in case we set cc=0.

Looking at the doc, all I find is:

"CC 0: Requested configuration-level number placed in
general register 0 or requested SYSIB informa-
tion stored"

But I don't find where it states that we are supposed to set
general register 0 to 0. Wouldn't we also have to do it for
fc=15 or for none?


It is only needed for fc 0, I do not see any reference to register 0 being set on output for other function code.



If fc 1,2,3 and 15 are to be handled equally, I suggest the following:

diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 9928f785c677..6eb86fa58b0b 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -893,17 +893,23 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
                        goto out_no_data;
                handle_stsi_3_2_2(vcpu, (void *) mem);
                break;
+       case 15:
+               if (sel1 != 1 || sel2 < 2 || sel2 > 6)
+                       goto out_no_data;
+               break;
        }
-       if (kvm_s390_pv_cpu_is_protected(vcpu)) {
-               memcpy((void *)sida_origin(vcpu->arch.sie_block), (void *)mem,
-                      PAGE_SIZE);
-               rc = 0;
-       } else {
-               rc = write_guest(vcpu, operand2, ar, (void *)mem, PAGE_SIZE);
-       }
-       if (rc) {
-               rc = kvm_s390_inject_prog_cond(vcpu, rc);
-               goto out;
+       if (mem) {
+               if (kvm_s390_pv_cpu_is_protected(vcpu)) {
+                       memcpy((void *)sida_origin(vcpu->arch.sie_block),
+                              (void *)mem, PAGE_SIZE);
+               } else {
+                       rc = write_guest(vcpu, operand2, ar, (void *)mem,
+                                        PAGE_SIZE);
+                       if (rc) {
+                               rc = kvm_s390_inject_prog_cond(vcpu, rc);
+                               goto out;
+                       }
+               }
        }
        if (vcpu->kvm->arch.user_stsi) {
                insert_stsi_usr_data(vcpu, operand2, ar, fc, sel1, sel2);


2. maximum-MNest facility

"
1. If the maximum-MNest facility is installed and
selector 2 exceeds the nonzero model-depen-
dent maximum-selector-2 value."

2. If the maximum-MNest facility is not installed and
selector 2 is not specified as two.
"

We will we be handling the presence/absence of the maximum-MNest facility
(for our guest?) in QEMU, corect?

Yes


I do wonder if we should just let any fc=15 go to user space let the whole
sel1 / sel2 checking be handled there. I don't think it's a fast path after all.
But no strong opinion.

We can, here it is only tested about values that are always false and not depending on MNEST.
sel 1 is always 1
sel 2 can only be valid between 2 and 6


How do we identify availability of maximum-MNest facility?


With SCLP.
Here is no test on MNEST


3. User space awareness

How can user space identify that we actually forward these intercepts?
How can it enable them? The old KVM_CAP_S390_USER_STSI capability
is not sufficient.

I do wonder if we want KVM_CAP_S390_USER_STSI_15 or sth like that to change
the behavior once enabled by user space.


Userland can check the configuration toplogy facility to see if the STSI-15 is available.
Sorry about the old commit message in the next patch it does not make things clear.

Configuration topology facility, 11, enables function code 15 in the STSI instruction.


4. Without vcpu->kvm->arch.user_stsi, we indicate cc=0 to our guest,
also for fc 1,2,3. Is that actually what we want? (or do we simply not care
because the guest is not supposed to use stsi?)


Without the user_stsi, the information is gathered in the kernel.
For what can be gathered.
Patch e44fc8c9dab215ac0e from Ekaterina Tumanova introduced post handlers for STSI in user space.

Only what need to be done in userspace is checking the user_stsi.

--
Pierre Morel
IBM Lab Boeblingen