Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

From: Daniel P. BerrangÃ
Date: Thu Sep 05 2019 - 05:23:26 EST


On Thu, Sep 05, 2019 at 10:20:39AM +0100, Stefan Hajnoczi wrote:
> On Wed, Sep 04, 2019 at 08:07:36PM +0200, Heinrich Schuchardt wrote:
> > If an application tries to access memory that is not mapped, an error
> > ENOSYS, "load/store instruction decoding not implemented" may occur.
> > QEMU will hang with a register dump.
> >
> > Instead create a data abort that can be handled gracefully by the
> > application running in the virtual environment.
> >
> > Now the virtual machine can react to the event in the most appropriate
> > way - by recovering, by writing an informative log, or by rebooting.
> >
> > Signed-off-by: Heinrich Schuchardt <xypron.glpk@xxxxxx>
> > ---
> > virt/kvm/arm/mmio.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> > index a8a6a0c883f1..0cbed7d6a0f4 100644
> > --- a/virt/kvm/arm/mmio.c
> > +++ b/virt/kvm/arm/mmio.c
> > @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > if (ret)
> > return ret;
> > } else {
> > - kvm_err("load/store instruction decoding not implemented\n");
> > - return -ENOSYS;
> > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> > + return 1;
>
> I see this more as a temporary debugging hack than something to merge.
>
> It sounds like in your case the guest environment provided good
> debugging information and you preferred it over debugging this from the
> host side. That's fine, but allowing the guest to continue running in
> the general case makes it much harder to track down the root cause of a
> problem because many guest CPU instructions may be executed after the
> original problem occurs. Other guest software may fail silently in
> weird ways. IMO it's best to fail early.

The current error message is quite limited in its usefulness - mostly
you have to be able to google the message and hope to hit a previous
report which explains the problem, or find someone on IRC who remembers
the problem, etc.

Could we put a text doc in the kernel tree explaining the problem in
enough detail that people can identify their next steps to resolve it,
and then make this error message tell people to read that text doc.

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|