Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming
From: Jarkko Sakkinen
Date: Tue Feb 11 2025 - 18:24:21 EST
On Wed, Feb 12, 2025 at 10:18:11AM +1300, Huang, Kai wrote:
>
>
> On 12/02/2025 10:03 am, Jarkko Sakkinen wrote:
> > On Tue, Feb 11, 2025 at 08:25:58AM -0800, Dave Hansen wrote:
> > > > arch_memory_failure() but stay on sgx_active_page_list.
> > > > page->poison is not checked in the reclaimer logic meaning that a page could be
> > > > reclaimed and go through ETRACK, EBLOCK and EWB. This can lead to the
> > > > firmware receiving and MCE in one of those operations and going into
> > > > "unbreakable shutdown" and triggering a kernel panic on remaining cores.
> > >
> > > This requires low-level SGX implementation knowledge to fully
> > > understand. Both what "ETRACK, EBLOCK and EWB" are in the first place,
> > > how they are involved in reclaim and also why EREMOVE doesn't lead to
> > > the same fate.
> >
> > Does it? [I'll dig up Intel SDM to check this]
> >
>
> I just did. :-)
>
> It seems EREMOVE only reads and updates the EPCM entry for the target EPC
> page but won't actually access that EPC page.
That was fast, thank you!
This is pretty much also that should be explicitly stated in the commit
message.
BR, Jarkko