Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

From: Paul Menzel
Date: Thu Aug 25 2022 - 02:46:30 EST


Dear Jarkko,


Am 25.08.22 um 07:25 schrieb Jarkko Sakkinen:
On Thu, Aug 25, 2022 at 07:57:30AM +0300, Jarkko Sakkinen wrote:
On Fri, Aug 19, 2022 at 11:28:24AM -0700, Dave Hansen wrote:
On 8/19/22 09:02, Paul Menzel wrote:
On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:

```
[    0.000000] Linux version 5.18.0-4-amd64 (debian-kernel@xxxxxxxxxxxxxxxx) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
[…]
[    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
[…]
[    0.235418] sgx: EPC section 0x40200000-0x45f7ffff

Would you be able to send the entire dmesg, along with:

cat /proc/iomem # (as root)
and
cpuid -1 --raw

I'm suspecting either a BIOS problem. Reinette (cc'd) also thought this
might be a case of the SGX initialization getting a bit too far along
when it should have been disabled.

We had some bugs where we didn't stop fast enough after spitting out the
"SGX Launch Control is locked..." errors.

For some reason the pages do not get properly sanitized:

/* sanity check: */
WARN_ON(!list_empty(&sgx_dirty_page_list));

EPC should be good, given that EREMOVE does not fail.
If SGX would be disabled, also EREMOVE should fail.

Sorry forgot that in no circumstances we're printing the
error code inside __sgx_sanitize_pages(). I wrote a quick
patch to address this (attached) [*].

Paul,

Any chance to try the patch out?

Yes, I am going to try it in the next days.

It's pretty hard to attach e.g. kprobe to grab this info. Does it
reproduce every single time?
Yes, on each boot up.

Alternatively: what kind of workload is triggering this?
I do own 2020 model XPS13, which might be able to
reproduce the same issue.

The Dell XPS 13 9370 is from 2018 (Intel i5-8350U), so no idea if it happens with later processors.


Kind regards,

Paul


[*] Also: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@xxxxxxxxxx/T/#u