Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
From: Mauro Carvalho Chehab
Date: Tue Aug 13 2024 - 15:00:01 EST
Em Mon, 12 Aug 2024 11:39:00 +0200
Igor Mammedov <imammedo@xxxxxxxxxx> escreveu:
> > We may also store cper_offset there via bios_linker_loader_add_pointer()
> > and/or use bios_linker_loader_write_pointer(), but I can't see how the
> > data stored there can be retrieved, nor any advantage of using it instead
> > of the current code, as, in the end, we'll have 3 addresses that will be
> > used:
> >
> > - an address where a pointer to CPER record will be stored;
> > - an address where the ack will be stored;
> > - an address where the actual CPER record will be stored.
> >
> > And those are calculated on a single function and are all stored at the
> > ACPI table files.
> >
> > What am I missing?
>
> That's basically (2) approach and it works to some degree,
> unfortunately it's fragile when we start talking about migration
> and changing layout in the future.
>
> Lets take as example increasing size of 1) 'Generic Error Status Block',
> we are considering. Old QEMU will, tell firmware to allocate 1K buffer
> for it and calculated offsets to [1] (that you've stored/calculated) will
> include this assumption.
> Then in newer we QEMU increase size of [1] and all hardcoded offsets will
> account for new size, but if we migrate guest from old QEMU to this newer
> one all HEST tables layout within guest will match old QEMU assumptions,
> and as result newer QEMU with larger block size will write CPERs at wrong
> address considering we are still running guest from old QEMU.
> That's just one example.
>
> To make it work there a number of ways, but the ultimate goal is to pick
> one that's the least fragile and won't snowball in maintenance nightmare
> as number of GHES sources increases over time.
>
> This series tries to solve problem of mapping GHES source to
> a corresponding 'Generic Error Status Block' and related registers.
> However we are missing access to this mapping since it only
> exists in guest patched HEST (i.e in guest RAM only).
>
> The robust way to make it work would be for QEMU to get a pointer
> to whole HEST table and then enumerate GHES sources and related
> error/ack registers directly from guest RAM (sidestepping layout
> change issues this way).
>
> what I'm proposing is to use bios_linker_loader_write_pointer()
> (only once) so that firmware could tell QEMU address of HEST table,
> in which one can find a GHES source and always correct error/ack
> pointers (regardless of table[s] layout changes).
Ok, got it. Such change was not easy, but I finally figured out how
to make it actually work.
I'll address tomorrow your comment on patch 5/10 about using raw data also
for the other parts of CPER (generic error status and generic error data).
If you want to do a sneak peak, I'm keeping the latest development
version here:
https://gitlab.com/mchehab_kernel/qemu/-/commits/qemu_submission?ref_type=heads
In particular, the patch changing from /etc/hardware_errors offset to
a HEST offset is at:
https://gitlab.com/mchehab_kernel/qemu/-/commit/9197d22de09df97ce3d6725cb21bd2114c2eb43c
It contains several cleanups to make the logic clearer and more robust.
Thanks,
Mauro