Re: [PATCH] cxl/hdm: Fix hdm decoder init by adding COMMIT field check

From: Jonathan Cameron
Date: Mon Mar 06 2023 - 10:49:30 EST


On Fri, 3 Mar 2023 07:57:22 -0800
Ira Weiny <ira.weiny@xxxxxxxxx> wrote:

> Jonathan Cameron wrote:
> > On Thu, 2 Mar 2023 08:36:59 -0700
> > Dave Jiang <dave.jiang@xxxxxxxxx> wrote:
> >
> > > On 3/1/23 11:23 PM, Fan Ni wrote:
> > > > On Wed, Mar 01, 2023 at 11:54:08AM -0700, Dave Jiang wrote:
> > > >>
> > > > Hi Dave,
> > > > Thanks for looking into this.
> > > >>
> > > >> On 2/28/23 3:40 PM, Fan Ni wrote:
> > > >>> Add COMMIT field check aside with existing COMMITTED field check during
> > > >>> hdm decoder initialization to avoid a system crash during module removal
> > > >>> after destroying a region which leaves the COMMIT field being reset while
> > > >>> the COMMITTED field still being set.
> > > >>
> > > >> Hi Fan. Are you seeing this issue on qemu emulation or hardware? The
> > > > I run into the issue with qemu emulation.
> > > >> situation does not make sense to me. If we clear the COMMIT bit, then the
> > > >> COMMITTED bit should be cleared by the hardware shortly after right?
> > > >
> > > > From the spec, I cannot find any statement saying clearing the COMMIT bit
> > > > will automatically clear the COMMITTED. If I have not missed the statement in
> > > > the spec, I assume we should not make the assumption that it will be
> > > > cleared automatically for real hardware. But you may be right, leaving the
> > > > COMMITTED bit set can potentially cause some issue? Need to check more.
> > >
> > > I have not been able to find direct verbiage that indicates this either.
> > > However, logically it would make sense. Otherwise, the COMMITTED field
> > > never clears and prevents reprogramming of the HDM decoders. The current
> > > QEMU implementation is creating a situation where the HDM decoder is
> > > always active after COMMIT bit is set the first time, regardless whether
> > > COMMIT field has been cleared later on during a teardown. It does sound
> > > like a bug with QEMU emulation currently.
> >
> > I agree that one sane interpretation is that unsetting commit should result in
> > the decoder being deactivated and hence the commit bit dropping. However
> > I'm not sure that's the only sane interpretation.
> >
> > There is no verbage that I'm aware of that says the committed bit being
> > set means that the current register values are in use. It simply says that
> > when the commit bit was set, the HDM decoder was successfully committed
> > (using registers as set at that time). There is a specific statement about
> > not changing the registers whilst checks are in progress, but those checks
> > are only required if lock on commit is set, so it doesn't cover this case.
> >
> > Wonderfully there isn't actually anything says what a commit transition to 0
> > means. Does that result in the decoder become uncommitted, or does that only
> > happen when the next 0 to 1 transition happens?
> >
> > The only stuff we have is what happens when lock on commit = 1, which isn't
> > the case here.
> >
> > So is there another valid implementation? I think yes.
> > In some implementations, there will be a complex state machine that is
> > triggered when commit is set. That will then write some entirely invisible
> > internal state for decode logic based on the contents of the registers.
> > As such, once it's set committed, it typically won't look at the registers
> > again until another commit 0->1 transition happens.
> > At that point the
> > committed bit drops and raised again once the commit state machine finishes
> > (given QEMU doesn't emulate that delay the upshot is if you set commit then
> > check committed it will be set ;)
>
> I'm only barely following along so I wanted to make sure I understand...
>
> Are you saying that at the instant commit 0->1 happens hardware will clear
> commited to 0 so that software can later check for commited vs error not
> commited?

yup. That's what you'd see in such an implementation.

>
> Ira
>
> >
> > In that implementation the commit 1->0 transition is an irrelevance and
> > it won't change the committed bit state.
> >
> > So whilst the QEMU code is doing the less obvious implementation, I think
> > the spec still allows it. I don't mind QEMU changing to the more obvious
> > one though if someone wants to send a patch.
> >
> > Jonathan
> >
>
> [...]
>