RE: [PATCH 4/4] x86/sgx: Implement ENCLS[EUPDATESVN] and opportunistically call it during first EPC page alloc

From: Reshetova, Elena
Date: Fri Mar 28 2025 - 04:28:07 EST

Next message: Kirill A. Shutemov: "Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem config combination"
Previous message: Nuno Sá: "Re: [PATCH 4/4] iio: dac: ad3552r-hs: add support for internal ramp"
In reply to: Jarkko Sakkinen: "Re: [PATCH 4/4] x86/sgx: Implement ENCLS[EUPDATESVN] and opportunistically call it during first EPC page alloc"
Next in thread: Jarkko Sakkinen: "Re: [PATCH 4/4] x86/sgx: Implement ENCLS[EUPDATESVN] and opportunistically call it during first EPC page alloc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> On Thu, Mar 27, 2025 at 03:42:30PM +0000, Reshetova, Elena wrote:
> > > > > > + case SGX_NO_UPDATE:
> > > > > > + pr_debug("EUPDATESVN was successful, but CPUSVN
> was not
> > > > > updated, "
> > > > > > + "because current SVN was not newer than
> > > > > CPUSVN.\n");
> > > > > > + break;
> > > > > > + case SGX_EPC_NOT_READY:
> > > > > > + pr_debug("EPC is not ready for SVN update.");
> > > > > > + break;
> > > > > > + case SGX_INSUFFICIENT_ENTROPY:
> > > > > > + pr_debug("CPUSVN update is failed due to Insufficient
> > > > > entropy in RNG, "
> > > > > > + "please try it later.\n");
> > > > > > + break;
> > > > > > + case SGX_EPC_PAGE_CONFLICT:
> > > > > > + pr_debug("CPUSVN update is failed due to
> concurrency
> > > > > violation, please "
> > > > > > + "stop running any other ENCLS leaf and try it
> > > > > later.\n");
> > > > > > + break;
> > > > > > + default:
> > > > > > + break;
> > > > >
> > > > > Remove pr_debug() statements.
> > > >
> > > > This I am not sure it is good idea. I think it would be useful for system
> > > > admins to have a way to see that update either happened or not.
> > > > It is true that you can find this out by requesting a new SGX attestation
> > > > quote (and see if newer SVN is used), but it is not the faster way.
> > >
> > > Maybe pr_debug() is them wrong level if they are meant for sysadmins?
> > >
> > > I mean these should not happen in normal behavior like ever? As
> > > pr_debug() I don't really grab this.
> >
> > SGX_NO_UPDATE will absolutely happen normally all the time.
> > Since EUPDATESVN is executed every time EPC is empty, this is the
> > most common code you will get back (because microcode updates are rare).
> > Others yes, that would indicate some error condition.
> > So, what is the pr_level that you would suggest?
>
> Right, got it. That changes my conclusions:
>
> So I'd reformulate it like:
>
> switch (ret) {
> case 0:
> pr_info("EUPDATESVN: success\n);
> break;
> case SGX_EPC_NOT_READY:
> case SGX_INSUFFICIENT_ENTROPY:
> case SGX_EPC_PAGE_CONFLICT:
> pr_err("EUPDATESVN: error %d\n", ret);
> /* TODO: block/teardown driver? */
> break;
> case SGX_NO_UPDATE:
> break;
> default:
> pr_err("EUPDATESVN: unknown error %d\n", ret);
> /* TODO: block/teardown driver? */
> break;
> }
>
> Since when this is executed EPC usage is zero error cases should block
> or teardown SGX driver, presuming that they are because of either
> incorrect driver state or spurious error code.

I agree with the above, but not sure at all about the blocking/teardown the
driver. They are all potentially temporal things and SGX_INSUFFICIENT_ENTROPY
is even outside of SGX driver control and *does not* indicate any error
condition on the driver side itself. SGX_EPC_NOT_READY and SGX_EPC_PAGE_CONFLICT
would mean we have a bug somewhere because we thought we could go
do EUDPATESVN on empty EPC and prevented anyone from creating
pages in meanwhile but looks like we missed smth. That said, I dont know if we
want to fail the whole system in case we have such a code bug, this is very
aggressive (in case it is some rare edge condition that no one knew about or
guessed). So, I would propose to print the pr_err() as you have above but
avoid destroying the driver.
Would this work?

Best Regards,
Elena.

>
> If this happens, we definitely do not want service, right?
>
> I'm not sure of all error codes how serious they are, or are all of them
> consequence of incorrectly working driver.
>
> BR, Jarkko

Next message: Kirill A. Shutemov: "Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem config combination"
Previous message: Nuno Sá: "Re: [PATCH 4/4] iio: dac: ad3552r-hs: add support for internal ramp"
In reply to: Jarkko Sakkinen: "Re: [PATCH 4/4] x86/sgx: Implement ENCLS[EUPDATESVN] and opportunistically call it during first EPC page alloc"
Next in thread: Jarkko Sakkinen: "Re: [PATCH 4/4] x86/sgx: Implement ENCLS[EUPDATESVN] and opportunistically call it during first EPC page alloc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]