Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

From: James Morse
Date: Mon Apr 24 2017 - 13:15:57 EST


Hi Wang Xiongfeng,

On 21/04/17 12:33, Xiongfeng Wang wrote:
> On 2017/4/20 16:52, James Morse wrote:
>> On 19/04/17 03:37, Xiongfeng Wang wrote:
>>> On 2017/4/18 18:51, James Morse wrote:
>>>> The host expects to receive physical SError Interrupts. The ARM-ARM doesn't
>>>> describe a way to inject these as they are generated by the CPU.
>>>>
>>>> Am I right in thinking you want this to use SError Interrupts as an APEI
>>>> notification? (This isn't a CPU thing so the RAS spec doesn't cover this use)
>>>
>>> Yes, using sei as an APEI notification is one part of my consideration. Another use is for ESB.
>>> RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External Abort with ESB'
>>> describes the SEI recovery process when ESB is implemented.
>>>
>>> In this situation, SEI is routed to EL3 (SCR_EL3.EA = 1). When an SEI occurs in EL0 and not been taken immediately,
>>> and then an ESB instruction at SVC entry is executed, SEI is taken to EL3. The ESB at SVC entry is
>>> used for preventing the error propagating from user space to kernel space. The EL3 SEI handler collects
>>
>>> the errors and fills in the APEI table, and then jump to EL2 SEI handler. EL2 SEI handler inject
>>> an vSEI into EL1 by setting HCR_EL2.VSE = 1, so that when returned to OS, an SEI is pending.
>>
>> This step has confused me. How would this work with VHE where the host runs at
>> EL2 and there is nothing at Host:EL1?
>
> RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External Abort with ESB'
> I don't know whether the step described in the section is designed for guest os or host os or both.
> Yes, it doesn't work with VHE where the host runs at EL2.

If it uses Virtual SError, it must be for an OS running at EL1, as the code
running at EL2 (VHE:Linux, firmware or Xen) must set HCR_EL2.AMO to make this work.

I can't find the section you describe in this RAS spec:
https://developer.arm.com/docs/ddi0587/latest/armreliabilityavailabilityandserviceabilityrasspecificationarmv8forthearmv8aarchitectureprofile


>> >From your description I assume you have some firmware resident at EL2.

> Our actual SEI step is planned as follows:

Ah, okay. You can't use Virtual SError at all then, as firmware doesn't own EL2:
KVM does. KVM will save/restore the HCR_EL2 register as part of its world
switch. Firmware must not modify the hyp registers behind the hyper-visors back.


> Host OS: EL0/EL1 -> EL3 -> EL0/EL1

i.e. While the host was running, so EL3 sees HCR_EL2.AMO clear, and directs the
SEI to EL1.

> Guest OS: EL0/EL1 -> EL3 -> EL2 -> EL0/EL1

i.e. While the guest was running, so EL3 sees HCR_EL2.AMO set, directs the SEI
to EL2 where KVM performs the world-switch and returns to host EL1.

Looks sensible.


> In guest os situation, we can inject an vSEI and return to where the SEI is taken from.

An SEI from firmware should always end up in the host OS. If a guest was running
KVM is responsible for doing the world switch to restore host EL1 and return
there. This should all work today.


> But in host os situation, we can't inject an vSEI (if we don't set HCR_EL2.AMO), so we have to jump to EL1 SEI vector.

Because your firmware is at EL3 you have to check PSTATE.A and jump into the
SError vector in both cases, you just choose if its VBAR_EL2 or VBAR_EL1 based
on HCR_EL2.AMO.


> Then the process following ESB won't be executed becuase SEI is taken to EL3 from the ESB instruction in EL1, and when control
> is returned to OS, we are in EL1 SEI vector rather than the ESB instruction.

Firmware should set ELR_EL1 so that an eret from the EL1 SError vector takes you
back to the ESB instruction that took us to EL3 in the first place.


There is a problem here mixing SError as the CPU->Software notification of RAS
errors and as APEI's SEI Software->Software notification that a firmware-first
error has been handled by EL3.

To elaborate:
The routine in this patch was something like:
1 ESB
2 Read DISR_EL1
3 If set, branch to the SError handler.

If we have firmware-first ESB will generate an SError as PSTATE.A at EL{1,2}
doesn't mask SError for EL3. Firmware can then handle the RAS error. With this
the CPU->Software story is done. Firmware notifying the OS via APEI is a
different problem.

If the error doesn't need to be notified to the OS via SEI, firmware can return
to step 1 above with DISR_EL1 clear. The OS may be told via another mechanism
such as polling or an irq.

If PSTATE.A was clear, firmware can return to the SError vector. If PSTATE.A was
set and firmware knows this was due to an ESB, it can set DISR_EL1. The problem
is firmware can't know if the SError was due to an ESB.

The only time we are likely to have PSTATE.A masked is when changing exception
level. For this we can rely on IESB, so step 1 isn't necessary in the routine
above. Firmware knows if an SError taken to EL3 is due to an IESB, as the
ESR_EL3.IESB bit will be set, in this case it can write to DISR_EL1 and return.

In Linux we can use IESB for changes in exception level, and ESB with PSTATE.A
clear for any other need. (e.g. __switch_to()). Firmware can notify us using SEI
if we use either of these two patterns. This may not work for other OS (Xen?),
but this is a problem with using SEI as an APEI notification method.

(Some cleanup of our SError unmasking is needed, I have some patches I will post
on a branch shortly)


> It is ok to just ignore the process following the ESB instruction in el0_sync, because the process will be sent SIGBUS signal.

I don't understand. How will Linux know the process caused an error if we
neither take an SError nor read DISR_EL1 after an ESB?


> But for el0_irq, the irq process following the ESB may should be executed because the irq is not related to the user process.

If we exit EL0 for an IRQ, and take an SError due to IESB we know the fault was
due to EL0, not the exception level we entered to process the IRQ.


> If we set HCR_EL2.AMO when SCR_EL3.EA is set. We can still inject an vSEI into host OS in EL3.

(from EL3?) Only if the host OS is running at EL1 and isn't using EL2. Firmware
can't know if we're using EL2 so this can't work.


> Physical SError won't be taken to EL2 because SCR_EL3.EA is set. But this may be too racy is
> not consistent with ARM rules.

EL3 flipping bits in EL2 system registers behind EL2 software's back is going to
have strange side effects. e.g. some Hypervisor may not change HCR_EL2 when
switching VM-A to VM-B as the configuration is the same; now the virtual SError
will be delivered to the wrong VM. (KVM doesn't do this, but Xen might.)


>>>> You cant use SError to cover all the possible RAS exceptions. We already have
>>>> this problem using SEI if PSTATE.A was set and the exception was an imprecise
>>>> abort from EL2. We can't return to the interrupted context and we can't deliver
>>>> an SError to EL2 either.
>>>
>>> SEI came from EL2 and PSTATE.A is set. Is it the situation where VHE is enabled and CPU is running
>>> in kernel space. If SEI occurs in kernel space, can we just panic or shutdown.
>>
>> firmware-panic? We can do a little better than that:
>> If the IESB bit is set in the ESR we can behave as if this were an ESB and have
>> firmware write an appropriate ESR to DISR_EL1 if PSTATE.A is set and the
>> exception came from EL2.
>
> This method may solve the problem I said above.
> Is IESB a new bit added int ESR in the newest RAS spec?

The latest ARM-ARM (DDI0487B.a) describes it as part of the "RAS Extension in
ARMv8.2" in section 'A1.7.5 The Reliability, Availability, and Serviceability
(RAS) Extension'.

Jumping to 'D7.2.28 ESR_ELx, Exception Syndrome Register (ELx)', then page
D7-2284 for the 'ISS encoding for an SError interrupt', it indicates "the SError
interrupt was synchronized by the implicit [ESB] and taken immediately"


As above, with IESB for changes in exception level, and unmasking PSTATE.A for
any other use of ESB, Linux can make sure that firmware can always notify us of
an APEI event using SEI.

Another OS (or even UEFI) may not do the same, so firmware should be prepared
for ESB to be executed with PSTATE.A masked (i.e, not using IESB). If there is
no wider firmware-policy for this (reboot, notify a remote CPU), then returning
to the faulting location and trying to inject SError later is all we can do.


Thanks,

James