Re: [PATCH] cxl/hdm: Fix hdm decoder init by adding COMMIT field check

From: Ira Weiny
Date: Tue Mar 07 2023 - 12:32:41 EST


Jonathan Cameron wrote:
> On Mon, 6 Mar 2023 16:04:22 +0000
> Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:
>
> > On Fri, 3 Mar 2023 17:21:13 +0000
> > Fan Ni <fan.ni@xxxxxxxxxxx> wrote:
> >
> > > On Fri, Mar 03, 2023 at 02:36:05PM +0000, Jonathan Cameron wrote:
> > >
> > > > On Thu, 2 Mar 2023 08:36:59 -0700
> > > > Dave Jiang <dave.jiang@xxxxxxxxx> wrote:
> > > >
> > > > > On 3/1/23 11:23 PM, Fan Ni wrote:
> > > > > > On Wed, Mar 01, 2023 at 11:54:08AM -0700, Dave Jiang wrote:
> > > > > >>
> > > > > > Hi Dave,
> > > > > > Thanks for looking into this.
> > > > > >>
> > > > > >> On 2/28/23 3:40 PM, Fan Ni wrote:
> > > > > >>> Add COMMIT field check aside with existing COMMITTED field check during
> > > > > >>> hdm decoder initialization to avoid a system crash during module removal
> > > > > >>> after destroying a region which leaves the COMMIT field being reset while
> > > > > >>> the COMMITTED field still being set.
> > > > > >>
> > > > > >> Hi Fan. Are you seeing this issue on qemu emulation or hardware? The
> > > > > > I run into the issue with qemu emulation.
> > > > > >> situation does not make sense to me. If we clear the COMMIT bit, then the
> > > > > >> COMMITTED bit should be cleared by the hardware shortly after right?
> > > > > >
> > > > > > From the spec, I cannot find any statement saying clearing the COMMIT bit
> > > > > > will automatically clear the COMMITTED. If I have not missed the statement in
> > > > > > the spec, I assume we should not make the assumption that it will be
> > > > > > cleared automatically for real hardware. But you may be right, leaving the
> > > > > > COMMITTED bit set can potentially cause some issue? Need to check more.
> > > > >
> > > > > I have not been able to find direct verbiage that indicates this either.
> > > > > However, logically it would make sense. Otherwise, the COMMITTED field
> > > > > never clears and prevents reprogramming of the HDM decoders. The current
> > > > > QEMU implementation is creating a situation where the HDM decoder is
> > > > > always active after COMMIT bit is set the first time, regardless whether
> > > > > COMMIT field has been cleared later on during a teardown. It does sound
> > > > > like a bug with QEMU emulation currently.
> > > >
> > > > I agree that one sane interpretation is that unsetting commit should result in
> > > > the decoder being deactivated and hence the commit bit dropping. However
> > > > I'm not sure that's the only sane interpretation.
> > > >
> > > > There is no verbage that I'm aware of that says the committed bit being
> > > > set means that the current register values are in use. It simply says that
> > > > when the commit bit was set, the HDM decoder was successfully committed
> > > > (using registers as set at that time). There is a specific statement about
> > > > not changing the registers whilst checks are in progress, but those checks
> > > > are only required if lock on commit is set, so it doesn't cover this case.
> > > >
> > > > Wonderfully there isn't actually anything says what a commit transition to 0
> > > > means. Does that result in the decoder become uncommitted, or does that only
> > > > happen when the next 0 to 1 transition happens?
> > > >
> > > > The only stuff we have is what happens when lock on commit = 1, which isn't
> > > > the case here.
> > > >
> > > > So is there another valid implementation? I think yes.
> > > > In some implementations, there will be a complex state machine that is
> > > > triggered when commit is set. That will then write some entirely invisible
> > > > internal state for decode logic based on the contents of the registers.
> > > > As such, once it's set committed, it typically won't look at the registers
> > > > again until another commit 0->1 transition happens. At that point the
> > > > committed bit drops and raised again once the commit state machine finishes
> > > > (given QEMU doesn't emulate that delay the upshot is if you set commit then
> > > > check committed it will be set ;)
> > > >
> > > > In that implementation the commit 1->0 transition is an irrelevance and
> > > > it won't change the committed bit state.
> > > >
> > > > So whilst the QEMU code is doing the less obvious implementation, I think
> > > > the spec still allows it. I don't mind QEMU changing to the more obvious
> > > > one though if someone wants to send a patch.
> > > >
> > > > Jonathan
> > > >
> > >
> > > In current qemu emulation, when COMMITTED bit is set when the decoder is
> > > committed and at the same time the COMMIT field will be cleared. Does
> > > the following fix make sense?
> > > 1. At qemu side, when the commit completes, just set the COMMITTED bit,
> > > but leave the COMMIT bit as set, also check LOCK ON COMMIT bit,
> > > if it is set, clear it, which will allow further reset of COMMIT bit.
> >
> > QEMU definitely can't do anything to the Commit bit, other than prevent it being
> > cleared if lock on commit is set.
> > Right now the QEMU emulation doesn't handle LOCK ON COMMIT at all.
> > It would be sensible to add this support, but we don't have an
> > open software stack that ever sets that yet so any testing is likely to be
> > one time only via some hacks.
> >
> > > 2. for the kernel side, if it needs to reprogram the decoder, it needs to
> > > check the COMMITTED bit, if it is set, then OS need to reset COMMIT bit
> > > first, which will also clear COMMITTED bit automatically at qemu side.
> >
> > Could do it that way, or simplify it by always clearing commit before setting
> > it to make sure the transition happens.
> >
> > Looks like commit is cleared in cxl_decoder_reset() already so this may
> > already happen - I haven't checked the flow.
> >
> > > 3. when the OS needs to reset the decoder, it does similar thing as 2 to
> > > reset COMMIT bit and qemu will clear COMMITTED bit.
> >
> > No the point of the above argument is that the spec doesn't say anything
> > about when committed is cleared. 2 options.
> > 1) Hardware clears it when commit 1->0.
> > 2) Hardware clears it when commit 0->1
> >
> > Given that spec only talks about after a commit 0->1 transition whilst commit
> > remains 1, the state after a commit 0->1 transition is implementation defined.
> >
> > I think that closing that corner case requires a clarification to the spec.
> >
> > Which leaves us with a sticky question of what to do...
>
> Thinking a little more on this and another close look at spec.
> The committed bit definition calls out "Indicates a decoder is active"
> so if it is not cleared when commit 1->0 then we may have a problem with
> multiple decoders and the clear only on commit 0->1 option
>
> Let us first setup decoders as.
> decoder 0 -> HPA X to X + N1 (then commit)
> decoder 1 -> HPA X + N1 to X + N1 + M1 (then commit)
>
> Now we want to change them without passing through a situation where we have
> overlap so that we have N2 > N1. There is a route to doing this but it's
> not very intuitive.

I'm a bit unclear on the variables here.

We have 2 ranges A and B and we want to add C?

Size of A is N1
Size of B is M1?

Then Size of C is N2?

Or is N2 a new size of N1? So the size of A is changing?

>
> 1. Unset commit on both decoders
> 2. Update decoder 1 first and commit. Have to do it in this order as
> decoder 0 is still committed (in use) so we can't overlap with it.
> 3. Update decoder 0 second and commit.
>
> If N1 < N2 need to reverse the order.
>
> 1. Unset commit on both decoders
> 2. Update decoder 0 first and commit. Avoids overlap with still committed decoder 1.
> 3. Update decoder 1 and commit.

If the size of A is changing then yes I think this is required. But I
don't think it has anything to do with the commit bit. I think we have to
program decoders in order anyway so this was required all along. Wasn't
it?

>
> So I think there is a path to make it work but it's nasty.

Not nice no... :-(

>
> I'll raise a query with CXL SSWG chair (off list but referring to this thread)

Not a bad idea. I'm no expert on this I'm just going off of what I have
heard/remember/read on the fly...

Ira