Re: [PATCH] cxl/hdm: Fix hdm decoder init by adding COMMIT field check

From: Ira Weiny
Date: Fri Mar 03 2023 - 10:57:42 EST


Jonathan Cameron wrote:
> On Thu, 2 Mar 2023 08:36:59 -0700
> Dave Jiang <dave.jiang@xxxxxxxxx> wrote:
>
> > On 3/1/23 11:23 PM, Fan Ni wrote:
> > > On Wed, Mar 01, 2023 at 11:54:08AM -0700, Dave Jiang wrote:
> > >>
> > > Hi Dave,
> > > Thanks for looking into this.
> > >>
> > >> On 2/28/23 3:40 PM, Fan Ni wrote:
> > >>> Add COMMIT field check aside with existing COMMITTED field check during
> > >>> hdm decoder initialization to avoid a system crash during module removal
> > >>> after destroying a region which leaves the COMMIT field being reset while
> > >>> the COMMITTED field still being set.
> > >>
> > >> Hi Fan. Are you seeing this issue on qemu emulation or hardware? The
> > > I run into the issue with qemu emulation.
> > >> situation does not make sense to me. If we clear the COMMIT bit, then the
> > >> COMMITTED bit should be cleared by the hardware shortly after right?
> > >
> > > From the spec, I cannot find any statement saying clearing the COMMIT bit
> > > will automatically clear the COMMITTED. If I have not missed the statement in
> > > the spec, I assume we should not make the assumption that it will be
> > > cleared automatically for real hardware. But you may be right, leaving the
> > > COMMITTED bit set can potentially cause some issue? Need to check more.
> >
> > I have not been able to find direct verbiage that indicates this either.
> > However, logically it would make sense. Otherwise, the COMMITTED field
> > never clears and prevents reprogramming of the HDM decoders. The current
> > QEMU implementation is creating a situation where the HDM decoder is
> > always active after COMMIT bit is set the first time, regardless whether
> > COMMIT field has been cleared later on during a teardown. It does sound
> > like a bug with QEMU emulation currently.
>
> I agree that one sane interpretation is that unsetting commit should result in
> the decoder being deactivated and hence the commit bit dropping. However
> I'm not sure that's the only sane interpretation.
>
> There is no verbage that I'm aware of that says the committed bit being
> set means that the current register values are in use. It simply says that
> when the commit bit was set, the HDM decoder was successfully committed
> (using registers as set at that time). There is a specific statement about
> not changing the registers whilst checks are in progress, but those checks
> are only required if lock on commit is set, so it doesn't cover this case.
>
> Wonderfully there isn't actually anything says what a commit transition to 0
> means. Does that result in the decoder become uncommitted, or does that only
> happen when the next 0 to 1 transition happens?
>
> The only stuff we have is what happens when lock on commit = 1, which isn't
> the case here.
>
> So is there another valid implementation? I think yes.
> In some implementations, there will be a complex state machine that is
> triggered when commit is set. That will then write some entirely invisible
> internal state for decode logic based on the contents of the registers.
> As such, once it's set committed, it typically won't look at the registers
> again until another commit 0->1 transition happens.
> At that point the
> committed bit drops and raised again once the commit state machine finishes
> (given QEMU doesn't emulate that delay the upshot is if you set commit then
> check committed it will be set ;)

I'm only barely following along so I wanted to make sure I understand...

Are you saying that at the instant commit 0->1 happens hardware will clear
commited to 0 so that software can later check for commited vs error not
commited?

Ira

>
> In that implementation the commit 1->0 transition is an irrelevance and
> it won't change the committed bit state.
>
> So whilst the QEMU code is doing the less obvious implementation, I think
> the spec still allows it. I don't mind QEMU changing to the more obvious
> one though if someone wants to send a patch.
>
> Jonathan
>

[...]