Re: [PATCH v4 13/15] acpi/ghes: move offset calculus to a separate function

From: Mauro Carvalho Chehab
Date: Wed Dec 04 2024 - 04:04:17 EST


Em Wed, 4 Dec 2024 08:54:40 +0100
Igor Mammedov <imammedo@xxxxxxxxxx> escreveu:

> On Tue, 3 Dec 2024 14:47:30 +0100
> Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> wrote:
>
> > Em Tue, 3 Dec 2024 12:51:43 +0100
> > Igor Mammedov <imammedo@xxxxxxxxxx> escreveu:
> >
> > > On Fri, 22 Nov 2024 10:11:30 +0100
> > > Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> wrote:
> > >
> > > > Currently, CPER address location is calculated as an offset of
> > > > the hardware_errors table. It is also badly named, as the
> > > > offset actually used is the address where the CPER data starts,
> > > > and not the beginning of the error source.
> > > >
> > > > Move the logic which calculates such offset to a separate
> > > > function, in preparation for a patch that will be changing the
> > > > logic to calculate it from the HEST table.
> > > >
> > > > While here, properly name the variable which stores the cper
> > > > address.
> > > >
> > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx>
> > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > > > ---
> > > > hw/acpi/ghes.c | 41 ++++++++++++++++++++++++++++++++---------
> > > > 1 file changed, 32 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > > index 87fd3feedd2a..d99697b20164 100644
> > > > --- a/hw/acpi/ghes.c
> > > > +++ b/hw/acpi/ghes.c
> > > > @@ -364,10 +364,37 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > > > ags->present = true;
> > > > }
> > > >
> > > > +static void get_hw_error_offsets(uint64_t ghes_addr,
> > > > + uint64_t *cper_addr,
> > > > + uint64_t *read_ack_register_addr)
> > > > +{
> > >
> > >
> > > > + if (!ghes_addr) {
> > > > + return;
> > > > + }
> > >
> > > why do we need this check?
> >
> > It is a safeguard measure to avoid crashes and OOM access. If fw_cfg
> > callback doesn't fill it properly, this will be zero.
>
> shouldn't happen, but yeah it firmware job to write back addr
> which might happen for whatever reason (a bug for example).
>

The main reason I added it is that, after the second series, it could
also happen if there's something wrong with the backward compat logic.

So, both here and after switching to HEST-based offsets, I opted
to explicitly test.

> Perhaps push this up to the stack, so we don't have to deal
> with scattered checks in ghes code.
>
> kvm_arch_on_sigbus_vcpu() looks like a goo candidate for check
> and warn_once if that ever happens.
> It already calls acpi_ghes_present() which resolves GED device
> and then later we duplicate this job in ghes_record_cper_errors()
>
> so maybe rename acpi_ghes_present to something like AcpiGhesState* acpi_ghes_get_state()
> and call it instead. And then move ghes_addr check/warn_once there.
> This way the rest of ghes code won't have to deal handling practically
> impossible error conditions that cause reader to wonder why it might happen.

I'll look on it. Yet, if ok for you, I would prefer dealing with this
once we have a bigger picture, e.g. once we merge those tree series:

- cleanup series (this one);
- HEST offset (I'll be sending a new version today);
- error_inject.

Thanks,
Mauro