Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

From: Michael S. Tsirkin
Date: Tue Jan 07 2020 - 04:39:11 EST


On Tue, Jan 07, 2020 at 09:59:16AM +0100, Christian Borntraeger wrote:
>
>
> On 06.01.20 11:50, Michael S. Tsirkin wrote:
> > On Wed, Dec 18, 2019 at 04:59:02PM +0100, Christian Borntraeger wrote:
> >> On 18.12.19 16:10, Michael S. Tsirkin wrote:
> >>> On Wed, Dec 18, 2019 at 03:43:43PM +0100, Christian Borntraeger wrote:
> >>>> Michael,
> >>>>
> >>>> with
> >>>> commit db7286b100b503ef80612884453bed53d74c9a16 (refs/bisect/skip-db7286b100b503ef80612884453bed53d74c9a16)
> >>>> vhost: use batched version by default
> >>>> plus
> >>>> commit 6bd262d5eafcdf8cdfae491e2e748e4e434dcda6 (HEAD, refs/bisect/bad)
> >>>> Revert "vhost/net: add an option to test new code"
> >>>> to make things compile (your next tree is not easily bisectable, can you fix that as well?).
> >>>
> >>> I'll try.
> >>>
> >>>>
> >>>> I get random crashes in my s390 KVM guests after reboot.
> >>>> Reverting both patches together with commit decd9b8 "vhost: use vhost_desc instead of vhost_log" to
> >>>> make it compile again) on top of linux-next-1218 makes the problem go away.
> >>>>
> >>>> Looks like the batched version is not yet ready for prime time. Can you drop these patches until
> >>>> we have fixed the issues?
> >>>>
> >>>> Christian
> >>>>
> >>>
> >>> Will do, thanks for letting me know.
> >>
> >> I have confirmed with the initial reporter (internal test team) that <driver name='qemu'/>
> >> with a known to be broken linux next kernel also fixes the problem, so it is really the
> >> vhost changes.
> >
> > OK I'm back and trying to make it more bisectable.
> >
> > I pushed a new tag "batch-v2".
> > It's same code but with this bisect should get more information.
>
> I get the following with this tag
>
> drivers/vhost/net.c: In function âvhost_net_tx_get_vq_descâ:
> drivers/vhost/net.c:574:7: error: implicit declaration of function âvhost_get_vq_desc_batchâ; did you mean âvhost_get_vq_descâ? [-Werror=implicit-function-declaration]
> 574 | r = vhost_get_vq_desc_batch(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
> | ^~~~~~~~~~~~~~~~~~~~~~~
> | vhost_get_vq_desc
>

Not sure why but I pushed a wrong commit. Sorry. Should be good now.

--
MST