Re: [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address

From: Igor Mammedov
Date: Fri Sep 13 2024 - 09:25:32 EST


On Fri, 13 Sep 2024 07:44:35 +0200
Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> wrote:

> Em Wed, 11 Sep 2024 15:51:08 +0200
> Igor Mammedov <imammedo@xxxxxxxxxx> escreveu:
>
> > On Sun, 25 Aug 2024 05:45:56 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> wrote:
> >
> > > Store HEST table address at GPA, placing its content at
> > > hest_addr_le variable.
> > >
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx>
> >
> > This looks good to me.
> >
> > in addition to this, it needs a patch on top to make sure
> > that we migrate hest_addr_le.
> > See a08a64627b6b 'ACPI: Record the Generic Error Status Block address'
> > and fixes on top of that for an example.
>
> Hmm... If I understood such change well, vmstate_ghes_state() will
> use this structure as basis to do migration:
>
> /* ghes.h */
> typedef struct AcpiGhesState {
> uint64_t hest_addr_le;
> uint64_t ghes_addr_le;
> bool present; /* True if GHES is present at all on this board */
> } AcpiGhesState;
>
> /* generic_event_device.c */
> static const VMStateDescription vmstate_ghes_state = {
> .name = "acpi-ged/ghes",
> .version_id = 1,
> .minimum_version_id = 1,
> .needed = ghes_needed,
> .fields = (VMStateField[]) {
> VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> vmstate_ghes_state, AcpiGhesState),
> VMSTATE_END_OF_LIST()
> }
> };

current code looks like that:

static const VMStateDescription vmstate_ghes = {
.name = "acpi-ghes",
.version_id = 1,
.minimum_version_id = 1,
.fields = (const VMStateField[]) {
VMSTATE_UINT64(ghes_addr_le, AcpiGhesState), <<===
VMSTATE_END_OF_LIST()
},
};

static bool ghes_needed(void *opaque)
{
AcpiGedState *s = opaque;
return s->ghes_state.ghes_addr_le;
}

static const VMStateDescription vmstate_ghes_state = {
.name = "acpi-ged/ghes",
.version_id = 1,
.minimum_version_id = 1,
.needed = ghes_needed,
.fields = (const VMStateField[]) {
VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
vmstate_ghes, AcpiGhesState),
VMSTATE_END_OF_LIST()
}
};

where
VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
explicitly defines field(s) within structure to be sent over wire.

we need to add a conditional field for ghes_addr_le
which will be sent only with new machine types, but not with old ones
to avoid migration breakage.

I don't know much about migration, but maybe we can get away with
similar condition as in ghes_needed(), or enabling QMP error injection
based on machine type version.

Or maybe adding a CLI option to enable QMP error injection in which
case the explicit option would serve as a trigger enable QMP command and
to migrate hest_addr_le.
It might be even better this way, since machine wouldn't need to
carry extra error source that will be used only for testing
and practically never in production VMs (aka reduced attack surface).

You can easily test it locally:
new-qemu: with your patches
old-qemu: qemu-9.1

and then try to do forth & back migration for following cases:
1. (ping-pong case with OLD firmware/ACPI tables)
start old-qemu with 9.1 machine type ->
migrate to file ->
start new-qemu with 9.1 machine type -> restore from file ->
migrate to file ->
start old-qemu with 9.1 machine type ->restore from file ->

2. (ping-pong case with NEW firmware/ACPI tables)
do the same as #1 but starting with new-qemu binary

(from upstream pov #2 is optional, but not implementing it
is pain for downstream so it's better to have it if it's not
too much work)

> /* hw/arm/virt-acpi-build.c */
> void virt_acpi_setup(VirtMachineState *vms)
> {
> ...
> if (vms->ras) {
> assert(vms->acpi_dev);
> acpi_ged_state = ACPI_GED(vms->acpi_dev);
> acpi_ghes_add_fw_cfg(&acpi_ged_state->ghes_state,
> vms->fw_cfg, tables.hardware_errors);
> }
>
> Which relies on acpi_ghes_add_fw_cfg() function to setup callbacks for
> the migration:
>
> /* ghes.c */
> void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> GArray *hardware_error)
> {
> /* Create a read-only fw_cfg file for GHES */
> fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
> hardware_error->len);
>
> /* Create a read-write fw_cfg file for Address */
> fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
> NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
>
> fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
> NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
>
> ags->present = true;
> }
>
> It sounds to me that both ghes_addr_le and hest_addr_le will be migrated
> altogether.

fwcfg callbacks are irrelevant to migration, they tell firmware what to do
with specified addresses (in our case, write into ags->hest_addr_le address
of HEST), so that HW (qemu) would know where firmware placed the table.
But that's about all it does.

>
> Did I miss something?
>
> Thanks,
> Mauro
>