Re: [PATCH v4 13/15] acpi/ghes: move offset calculus to a separate function
From: Igor Mammedov
Date: Wed Dec 04 2024 - 03:29:32 EST
On Tue, 3 Dec 2024 14:47:30 +0100
Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> wrote:
> Em Tue, 3 Dec 2024 12:51:43 +0100
> Igor Mammedov <imammedo@xxxxxxxxxx> escreveu:
>
> > On Fri, 22 Nov 2024 10:11:30 +0100
> > Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> wrote:
> >
> > > Currently, CPER address location is calculated as an offset of
> > > the hardware_errors table. It is also badly named, as the
> > > offset actually used is the address where the CPER data starts,
> > > and not the beginning of the error source.
> > >
> > > Move the logic which calculates such offset to a separate
> > > function, in preparation for a patch that will be changing the
> > > logic to calculate it from the HEST table.
> > >
> > > While here, properly name the variable which stores the cper
> > > address.
> > >
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx>
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > > ---
> > > hw/acpi/ghes.c | 41 ++++++++++++++++++++++++++++++++---------
> > > 1 file changed, 32 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > index 87fd3feedd2a..d99697b20164 100644
> > > --- a/hw/acpi/ghes.c
> > > +++ b/hw/acpi/ghes.c
> > > @@ -364,10 +364,37 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > > ags->present = true;
> > > }
> > >
> > > +static void get_hw_error_offsets(uint64_t ghes_addr,
> > > + uint64_t *cper_addr,
> > > + uint64_t *read_ack_register_addr)
> > > +{
> >
> >
> > > + if (!ghes_addr) {
> > > + return;
> > > + }
> >
> > why do we need this check?
>
> It is a safeguard measure to avoid crashes and OOM access. If fw_cfg
> callback doesn't fill it properly, this will be zero.
shouldn't happen, but yeah it firmware job to write back addr
which might happen for whatever reason (a bug for example).
Perhaps push this up to the stack, so we don't have to deal
with scattered checks in ghes code.
kvm_arch_on_sigbus_vcpu() looks like a goo candidate for check
and warn_once if that ever happens.
It already calls acpi_ghes_present() which resolves GED device
and then later we duplicate this job in ghes_record_cper_errors()
so maybe rename acpi_ghes_present to something like AcpiGhesState* acpi_ghes_get_state()
and call it instead. And then move ghes_addr check/warn_once there.
This way the rest of ghes code won't have to deal handling practically
impossible error conditions that cause reader to wonder why it might happen.
> > > +
> > > + /*
> > > + * non-HEST version supports only one source, so no need to change
> > > + * the start offset based on the source ID. Also, we can't validate
> > > + * the source ID, as it is stored inside the HEST table.
> > > + */
> > > +
> > > + cpu_physical_memory_read(ghes_addr, cper_addr,
> > > + sizeof(*cper_addr));
> > > +
> > > + *cper_addr = le64_to_cpu(*cper_addr);
> > 1st bits flip, and then see later
> >
> > > +
> > > + /*
> > > + * As the current version supports only one source, the ack offset is
> > > + * just sizeof(uint64_t).
> > > + */
> > > + *read_ack_register_addr = ghes_addr +
> > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> > > +}
> > > +
> > > void ghes_record_cper_errors(const void *cper, size_t len,
> > > uint16_t source_id, Error **errp)
> > > {
> > > - uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > > + uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
> >
> > if get_hw_error_offsets() isn't supposed to fail, then we do not need to initialize
> > above. So this hunk doesn't belong to this patch.
>
> It may fail due to:
>
> if (!ghes_addr) {
> return;
> }
>
> >
> > > uint64_t start_addr;
> > > AcpiGedState *acpi_ged_state;
> > > AcpiGhesState *ags;
> > > @@ -389,18 +416,14 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> > >
> > > start_addr += source_id * sizeof(uint64_t);
> > >
> > > - cpu_physical_memory_read(start_addr, &error_block_addr,
> > > - sizeof(error_block_addr));
> > > + get_hw_error_offsets(start_addr, &cper_addr, &read_ack_register_addr);
> > >
> > > - error_block_addr = le64_to_cpu(error_block_addr);
> > > - if (!error_block_addr) {
> > > + cper_addr = le64_to_cpu(cper_addr);
> > ^^^^ 2nd bits flip turning it back to guest byte order again
> >
> > suggest to keep only one of them in get_hw_error_offsets()
>
> Ok, I'll drop this one.
>
> > > + if (!cper_addr) {
> > > error_setg(errp, "can not find Generic Error Status Block");
> > > return;
> > > }
> > >
> > > - read_ack_register_addr = start_addr +
> > > - ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> > > -
> > > cpu_physical_memory_read(read_ack_register_addr,
> > > &read_ack_register, sizeof(read_ack_register));
> > >
> > > @@ -421,7 +444,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> > > &read_ack_register, sizeof(uint64_t));
> > >
> > > /* Write the generic error data entry into guest memory */
> > > - cpu_physical_memory_write(error_block_addr, cper, len);
> > > + cpu_physical_memory_write(cper_addr, cper, len);
> > >
> > > return;
> > > }
> >
>
> Thanks,
> Mauro
>