Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

From: Michael S. Tsirkin
Date: Tue Apr 04 2017 - 10:24:46 EST


On Tue, Apr 04, 2017 at 04:18:02PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> > > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > > > Mike,
> > > > > >
> > > > > > can you try the patch below?
> > > > >
> > > > > No more spinning kworker woes, but I still have a warning on
> > > > > hibernate,
> > > > > threadirqs invariant. I'm also seeing intermittent post
> > > > > hibernate hang
> > > > > funnies in virgin source +- this patch, and without threadirqs.
> > > > >
> > > > > [ 110.223953] WARNING: CPU: 5 PID: 452 at
> > > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0
> > > > >
> > > > > > > -Mike
> > > >
> > > > I just sent a patch fixing that.
> > > > However I think we want to print a message when MSI fails to work
> > > > so we
> > > > know guest is falling back on legacy interrupts.
> > >
> > > The warning persists.
> > >
> > > [ 137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261
> > > pci_irq_vector+0xb1/0xe0
> >
> > Can you post the rest of the backtrace? Is it still in the console?
>
> This is from a dump of post hibernate loop dying vbox I captured and
> squirreled away, so pid is different. I'm not absolutely certain that
> I didn't have my local patch set re-applied when I did this, so I'll
> rebuild in the a.m.. My stuff is unrelated, so this should be fine.
>
> [ 328.475988] ------------[ cut here ]------------
> [ 328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0
> [ 328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) virtio_balloon(E) crct10dif_pclmul(E)
> [ 328.476019] crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) qxl(E) drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) crc32c_intel(E) ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) usbcore(E) floppy(E) mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
> [ 328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: G E 4.11.0-default #20
> [ 328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
> [ 328.476041] Workqueue: events_unbound async_run_entry_fn
> [ 328.476042] Call Trace:
> [ 328.476056] ? dump_stack+0x5c/0x85
> [ 328.476058] ? __warn+0xc4/0xe0
> [ 328.476060] ? pci_pm_poweroff+0xf0/0xf0
> [ 328.476062] ? pci_irq_vector+0xb1/0xe0
> [ 328.476064] ? vp_del_vqs+0xcb/0x120 [virtio_pci]
> [ 328.476066] ? remove_common+0x60/0x80 [virtio_rng]
> [ 328.476067] ? virtrng_freeze+0xa/0x10 [virtio_rng]
> [ 328.476068] ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
> [ 328.476069] ? pci_pm_freeze+0x59/0xe0
> [ 328.476070] ? dpm_run_callback+0x4d/0x170
> [ 328.476071] ? __device_suspend+0x11f/0x3b0
> [ 328.476072] ? pm_dev_dbg+0x70/0x70
> [ 328.476072] ? async_suspend+0x1a/0x90
> [ 328.476082] ? async_run_entry_fn+0x34/0x160
> [ 328.476083] ? process_one_work+0x164/0x430
> [ 328.476084] ? worker_thread+0x135/0x4d0
> [ 328.476085] ? kthread+0xff/0x140
> [ 328.476086] ? rescuer_thread+0x3c0/0x3c0
> [ 328.476087] ? kthread_park+0x80/0x80
> [ 328.476088] ? do_group_exit+0x39/0xa0
> [ 328.476090] ? ret_from_fork+0x26/0x40
> [ 328.476091] ---[ end trace a045c2118936902f ]---

Interesting, it's rng this time. I'll try that.

--
MST