On Fri, 19 Apr 2024 12:17:01 -0400
boris.ostrovsky@xxxxxxxxxx wrote:
On 4/17/24 9:58 AM, boris.ostrovsky@xxxxxxxxxx wrote:
I noticed that I was using a few months old qemu bits and now I am
having trouble reproducing this on latest bits. Let me see if I can get
this to fail with latest first and then try to trace why the processor
is in this unexpected state.
Looks like 012b170173bc "system/qdev-monitor: move drain_call_rcu call
under if (!dev) in qmp_device_add()" is what makes the test to stop failing.
I need to understand whether lack of failures is a side effect of timing
changes that simply make hotplug fail less likely or if this is an
actual (but seemingly unintentional) fix.
Agreed, we should find out culprit of the problem.
PS:
also if you are using AMD host, there was a regression in OVMF
where where vCPU that OSPM was already online-ing, was yanked
from under OSMP feet by OVMF (which depending on timing could
manifest as lost SIPI).
edk2 commit that should fix it is:
https://github.com/tianocore/edk2/commit/1c19ccd5103b
Switching to Intel host should rule that out at least.
(or use fixed edk2-ovmf-20240524-5.el10.noarch package from centos,
if you are forced to use AMD host)