Re: [PATCH v14 09/19] x86/mm: x86/sgx: Signal SEGV_SGXERR for #PFs w/ PF_SGX

From: Sean Christopherson
Date: Wed Sep 26 2018 - 16:44:03 EST

Next message: syzbot: "BUG: unable to handle kernel NULL pointer dereference in blk_mq_map_swqueue"
Previous message: Sam Ravnborg: "Re: [PATCH v1 1/5] dt-binding: rtci-pcf8523: add quartz_load property"
In reply to: Dave Hansen: "Re: [PATCH v14 09/19] x86/mm: x86/sgx: Signal SEGV_SGXERR for #PFs w/ PF_SGX"
Next in thread: Dave Hansen: "Re: [PATCH v14 09/19] x86/mm: x86/sgx: Signal SEGV_SGXERR for #PFs w/ PF_SGX"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Sep 26, 2018 at 01:16:59PM -0700, Dave Hansen wrote:
> On 09/26/2018 11:12 AM, Andy Lutomirski wrote:
> >> e omniscient.
> >>
> >> How about this? With formatting changes since it's long-winded...
> >>
> >> /*
> >> * Access is blocked by the Enclave Page Cache Map (EPCM), i.e. the
> >> * access is allowed by the PTE but not the EPCM. This usually happens
> >> * when the EPCM is yanked out from under us, e.g. by hardware after a
> >> * suspend/resume cycle. In any case, software, i.e. the kernel, can't
> >> * fix the source of the fault as the EPCM can't be directly modified
> >> * by software. Handle the fault as an access error in order to signal
> >> * userspace, e.g. so that userspace can rebuild their enclave(s), even
> >> * though userspace may not have actually violated access permissions.
> >> */
> >>
> > Looks good to me.
>
> Including the actual architectural definition of the bit might add some
> clarity. The SDM explicitly says (Vol 3a section 4.7):
>
> The fault resulted from violation of SGX-specific access-control
> requirements.
>
> Which totally squares with returning true from access_error().
>
> There's also a tidbit that says:
>
> This flag is 1 if the exception is unrelated to paging and
> resulted from violation of SGX-specific access-control
> requirements. ... such a violation can occur only if there
> is no ordinary page fault...
>
> This is pretty important. It means that *none* of the other
> paging-related stuff that we're doing applies.
>
> We also need to clarify how this can happen. Is it through something
> than an app does, or is it solely when the hardware does something under
> the covers, like suspend/resume.

Are you looking for something in the changelog, the comment, or just
a response? If it's the latter...

On bare metal with a bug-free kernel, the only scenario I'm aware of
where we'll encounter these faults is when hardware pulls the rug out
from under us. In a virtualized environment all bets are off because
the architecture allows VMMs to silently "destroy" the EPC at will,
e.g. KVM, and I believe Hyper-V, will take advantage of this behavior
to support live migration. Post migration, the destination system
will generate PF_SGX because the EPC{M} can't be migrated between
system, i.e. the destination EPCM sees all EPC pages as invalid.

Next message: syzbot: "BUG: unable to handle kernel NULL pointer dereference in blk_mq_map_swqueue"
Previous message: Sam Ravnborg: "Re: [PATCH v1 1/5] dt-binding: rtci-pcf8523: add quartz_load property"
In reply to: Dave Hansen: "Re: [PATCH v14 09/19] x86/mm: x86/sgx: Signal SEGV_SGXERR for #PFs w/ PF_SGX"
Next in thread: Dave Hansen: "Re: [PATCH v14 09/19] x86/mm: x86/sgx: Signal SEGV_SGXERR for #PFs w/ PF_SGX"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]