Re: [PATCH] x86/sgx: Synchronize encl->srcu in sgx_encl_release().
From: Jarkko Sakkinen
Date: Tue Dec 15 2020 - 16:40:25 EST
On Tue, Dec 15, 2020 at 11:34:37AM -0600, Haitao Huang wrote:
> On Mon, 14 Dec 2020 23:59:55 -0600, Jarkko Sakkinen <jarkko@xxxxxxxxxx>
> wrote:
>
> > On Tue, Dec 15, 2020 at 07:56:01AM +0200, Jarkko Sakkinen wrote:
> > > On Mon, Dec 14, 2020 at 11:01:32AM -0800, Sean Christopherson wrote:
> > > > On Fri, Dec 11, 2020, Jarkko Sakkinen wrote:
> > > > > Each sgx_mmun_notifier_release() starts a grace period, which
> > > means that
> > > >
> > > > Should be sgx_mmu_notifier_release(), here and in the comment.
> > >
> > > Thanks.
> > >
> > > > > one extra synchronize_rcu() in sgx_encl_release(). Add it there.
> > > > >
> > > > > sgx_release() has the loop that drains the list but with bad
> > > luck the
> > > > > entry is already gone from the list before that loop processes it.
> > > >
> > > > Why not include the actual analysis that "proves" the bug? The
> > > splat that
> > > > Haitao reported would also be useful info.
> > >
> > > True. I can include a snippet of dmesg to the commit message.
> > >
> > > > > Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
> > > > > Cc: Borislav Petkov <bp@xxxxxxxxx>
> > > > > Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> > > > > Reported-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> > > >
> > > > Haitao reported the bug, and for all intents and purposes provided
> > > the fix. I
> > > > just did the analysis to verify that there was a legitimate bug
> > > and that the
> > > > synchronization in sgx_encl_release() was indeed necessary.
> > >
> > > Good and valid point. The way I see it, the tags should be:
> > >
> > > Reported-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
> > > Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> > >
> > > Haitao pointed out the bug but from your analysis I could resolve that
> > > this is the fix to implement, and was able to write the long
> > > description for the commit.
> > >
> > > Does this make sense to you?
> >
> > I'm sending v2 next week (this week on vacation).
> >
> > /Jarkko
>
> I don't mind either how tags are assigned. But our testing reveals
> significant latency introduced in scenarios of heavy loading/unloading
> enclaves. synchronize_srcu_expedited fixed the issue. Please analyze and
> confirm if that's more appropriate than synchronize_srcu here.
I don't see any obvious reason why *_expedited could not be used here,
as most of the time sync's are taken care of sgx_release() loop, and the
final sync is with sgx_mmu_notifier_release(). More aggressive spinning
should not do any harm here.
About the tags. I just try to get them right, and it is sometimes not
straight-forward. So I guess, with all things considered, I'll put
suggested-by from you. Once I get a refined patch out, try it out with
your workloads and provide me tested-by, if it is working for you.
/Jarkko