Re: [PATCH v8 06/13] acpi/ghes: add support for generic error injection via QAPI

From: Igor Mammedov
Date: Thu Sep 12 2024 - 08:42:49 EST


On Wed, 11 Sep 2024 16:34:36 +0100
Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:

> On Wed, 11 Sep 2024 15:21:32 +0200
> Igor Mammedov <imammedo@xxxxxxxxxx> wrote:
>
> > On Sun, 25 Aug 2024 05:29:23 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> wrote:
> >
> > > Em Mon, 19 Aug 2024 14:51:36 +0200
> > > Igor Mammedov <imammedo@xxxxxxxxxx> escreveu:
> > >
> > > > > + read_ack = 1;
> > > > > + cpu_physical_memory_write(read_ack_start_addr,
> > > > > + &read_ack, (uint64_t));
> > > > we don't do this for SEV so, why are you setting it to 1 here?
> > >
> > > According with:
> > > https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#generic-hardware-error-source-version-2-ghesv2-type-10
> > >
> > > "These are the steps the OS must take once detecting an error from a particular GHESv2 error source:
> > >
> > > OSPM detects error (via interrupt/exception or polling the block status)
> > >
> > > OSPM copies the error status block
> > >
> > > OSPM clears the block status field of the error status block
> > >
> > > OSPM acknowledges the error via Read Ack register. For example:
> > >
> > > OSPM reads the Read Ack register –> X
> > >
> > > OSPM writes –> (( X & ReadAckPreserve) | ReadAckWrite)"
> > >
> > >
> > > So, basically the guest OS takes some time to detect that an error
> > > is raised. When it detects, it needs to mark that the error was
> > > handled.
> >
> > what you are doing here by setting read_ack = 1,
> > is making ack on behalf of OSPM when OSPM haven't handled existing error yet.
> >
> > Essentially making HW/FW do the job of OSPM. That looks wrong to me.
> > From HW/FW side read_ack register should be thought as read-only.
>
> It's not read-only because HW/FW has to clear it so that HW/FW can detect
> when the OSPM next writes it.

By readonly, I've meant that hw shall not do above mentioned write
(bad phrasing on my side).

>
> Agreed this write to 1 looks wrong, but the one a few lines further down (to zero
> it) is correct.

yep, hw should clear register.
It would be better to so on OSPM ACK, but alas we can't intercept that,
so the next option would be to do that at the time when we add a new error block

>
> My bug a long time back I think.
>
> Jonathan
>
> >
> > >
> > > IMO, this is needed, independently of the notification mechanism.
> > >
> > > Regards,
> > > Mauro
> > >
> >
> >
>