Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error

From: gengdongjiu
Date: Fri May 05 2017 - 08:31:31 EST


HI James,

2017-05-05 0:52 GMT+08:00 gengdongjiu <gengdj.1984@xxxxxxxxx>:
> Dear James,
> Thanks a lot for your review and comments. I am very sorry for the
> late response.
>
>
> 2017-05-04 23:42 GMT+08:00 gengdongjiu <gengdj.1984@xxxxxxxxx>:
>> Hi Dongjiu Geng,
>>
>> On 30/04/17 06:37, Dongjiu Geng wrote:
>>> when happen SEA, deliver signal bus and handle the ioctl that
>>> inject SEA abort to guest, so that guest can handle the SEA error.
>>
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index 105b6ab..a96594f 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -20,8 +20,10 @@
>>> @@ -1238,6 +1240,36 @@ static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
>>> __coherent_cache_guest_page(vcpu, pfn, size);
>>> }
>>>
>>> +static void kvm_send_signal(unsigned long address, bool hugetlb, bool hwpoison)
>>> +{
>>> + siginfo_t info;
>>> +
>>> + info.si_signo = SIGBUS;
>>> + info.si_errno = 0;
>>> + if (hwpoison)
>>> + info.si_code = BUS_MCEERR_AR;
>>> + else
>>> + info.si_code = 0;
>>> +
>>> + info.si_addr = (void __user *)address;
>>> + if (hugetlb)
>>> + info.si_addr_lsb = PMD_SHIFT;
>>> + else
>>> + info.si_addr_lsb = PAGE_SHIFT;
>>> +
>>> + send_sig_info(SIGBUS, &info, current);
>>> +}
>>> +
>> Â [hide part of quote]
>>
>> Punit reviewed the other version of this patch, this PMD_SHIFT is not the right
>> thing to do, it needs a more accurate set of calls and shifts as there may be
>> hugetlbfs pages other than PMD_SIZE.
>>
>> https://www.spinics.net/lists/arm-kernel/msg568919.html
>>
>> I haven't posted a new version of that patch because I was still hunting a bug
>> in the hugepage/hwpoison code, even with Punit's fixes series I see -EFAULT
>> returned to userspace instead of this hwpoison code being invoked.
>
> Ok, got it, thanks for your information.
>>
>> Please avoid duplicating functionality between patches, it wastes reviewers
>> time, especially when we know there are problems with this approach.
>>
>>
>>> +static void kvm_handle_bad_page(unsigned long address,
>>> + bool hugetlb, bool hwpoison)
>>> +{
>>> + /* handle both hwpoison and other synchronous external Abort */
>>> + if (hwpoison)
>>> + kvm_send_signal(address, hugetlb, true);
>>> + else
>>> + kvm_send_signal(address, hugetlb, false);
>>> +}
>>
>> Why the extra level of indirection? We only want to signal userspace like this
>> from KVM for hwpoison. Signals for RAS related reasons should come from the bits
>> of the kernel that decoded the error.
>
> For the SEA, the are maily two types:
> 0b010000 Synchronous External Abort on memory access.
> 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0]
> encode the level.
>
> hwpoison should belong to the "Synchronous External Abort on memory access"
> if the SEA type is not hwpoison, such as page table walk, do you mean
> KVM do not deliver the SIGBUS?
> If so, how the KVM handle the SEA type other than hwpoison?
>
>>
>> (hwpoison for KVM is a corner case as Qemu's memory effectively has two users,
>> Qemu and KVM. This isn't the example of how user-space gets signalled.)
>>
>>
>>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>>> index b37446a..780e3c4 100644
>>> --- a/arch/arm64/kvm/guest.c
>>> +++ b/arch/arm64/kvm/guest.c
>>> @@ -277,6 +277,13 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>>> return -EINVAL;
>>> }
>>>
>>> +int kvm_vcpu_ioctl_sea(struct kvm_vcpu *vcpu)
>>> +{
>>> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>>> +
>>> + return 0;
>>> +}
>>
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index bb02909..1d2e2e7 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping {
>>> #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
>>> /* Available with KVM_CAP_X86_SMM */
>>> #define KVM_SMI _IO(KVMIO, 0xb7)
>>> +#define KVM_ARM_SEA _IO(KVMIO, 0xb8)
>>>
>>> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
>>> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
>>>
>>
>> Why do we need a userspace API for SEA? It can also be done by using
>> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it this
>> way is you can choose which ESR value to use.
>>
>> Adding a new API call to do something you could do with an old one doesn't look
>> right.
>
> James, I considered your suggestion before that use the
> KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does
> not have difference to use the alread existed KVM API. so may be
> changing the vcpu registers in qemu will duplicate with the KVM APIs.
>
> injection a SEA is no more than setting some registers: elr_el1, PC,
> PSTATE, SPSR_el1, far_el1, esr_el1
> I seen this KVM API do the same thing as Qemu. do you found call this
> API will have issue and necessary to choose another ESR value?

I consider again, you are right.

when guest OS happen an SEA, My current solution is shown below:

(1) host EL3 firmware firstly handle the SEA error and generate the CPER record.
(2) EL3 firmware separately copy the esr_el3, elr_el3, SPSR_el3,
far_el3 to the esr_el2, elr_el2, SPSR_el2, far_el2.
(3) then jump the EL2 hypervisor

so the EL2 hypervisor uses the ESR that come from esr_el3, here the
ESR(esr_el3) value may be different with the exist KVM API's ESR.


>
> I pasted the alread existed KVM API code:
>
> static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned
> long addr)
> {
> unsigned long cpsr = *vcpu_cpsr(vcpu);
> bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
> u32 esr = 0;
> *vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
> *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
> *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
> *vcpu_spsr(vcpu) = cpsr;
> vcpu_sys_reg(vcpu, FAR_EL1) = addr;
> /*
> * Build an {i,d}abort, depending on the level and the
> * instruction set. Report an external synchronous abort.
> */
> if (kvm_vcpu_trap_il_is32bit(vcpu))
> esr |= ESR_ELx_IL;
> /*
> * Here, the guest runs in AArch64 mode when in EL1. If we get
> * an AArch32 fault, it means we managed to trap an EL0 fault.
> */
> if (is_aarch32 || (cpsr & PSR_MODE_MASK) == PSR_MODE_EL0t)
> esr |= (ESR_ELx_EC_IABT_LOW << ESR_ELx_EC_SHIFT);
> else
> esr |= (ESR_ELx_EC_IABT_CUR << ESR_ELx_EC_SHIFT);
> if (!is_iabt)
> esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
> vcpu_sys_reg(vcpu, ESR_EL1) = esr | ESR_ELx_FSC_EXTABT;
> }
>
> static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
> unsigned long addr)
> {
> u32 vect_offset;
> u32 *far, *fsr;
> bool is_lpae;
> if (is_pabt) {
> vect_offset = 12;
> far = &vcpu_cp15(vcpu, c6_IFAR);
> fsr = &vcpu_cp15(vcpu, c5_IFSR);
> } else { /* !iabt */
> vect_offset = 16;
> far = &vcpu_cp15(vcpu, c6_DFAR);
> fsr = &vcpu_cp15(vcpu, c5_DFSR);
> }
> prepare_fault32(vcpu, COMPAT_PSR_MODE_ABT | COMPAT_PSR_A_BIT, vect_offset);
> *far = addr;
> /* Give the guest an IMPLEMENTATION DEFINED exception */
> is_lpae = (vcpu_cp15(vcpu, c2_TTBCR) >> 31);
> if (is_lpae)
> *fsr = 1 << 9 | 0x34;
> else
> *fsr = 0x14;
> }
>
>
> /**
> * kvm_inject_dabt - inject a data abort into the guest
> * @vcpu: The VCPU to receive the undefined exception
> * @addr: The address to report in the DFAR
> *
> * It is assumed that this code is called from the VCPU thread and that the
> * VCPU therefore is not currently executing guest code.
> */
> void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
> {
> if (!(vcpu->arch.hcr_el2 & HCR_RW))
> inject_abt32(vcpu, false, addr);
> else
> inject_abt64(vcpu, false, addr);
> }
>
>
>>
>>
>> Thanks,
>>
>> James