Re: [REGRESSION] Failed network caused by: xhci: switch to pci_alloc_irq_vectors

From: Steven Rostedt
Date: Fri May 19 2017 - 06:09:06 EST


On Fri, 19 May 2017 07:42:23 +0200
Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, May 18, 2017 at 11:42:34PM -0400, Steven Rostedt wrote:
> >
> > One of my the configs I use to test ftrace with (configs that have
> > caused failures in the past), has lots of irq issues and fails to
> > initialize the network of my box. I bisected the problem down to a
> > single commit, and when I revert that commit, my box boots without any
> > network or irq issues.
> >
> > Note, my other configs work fine on this box. I haven't investigated
> > which config is also the culprit. But since it use to work with this
> > config, I want to report it.
>
> So what commit is causing the problem?

Ug, I forgot to cut and paste the sha1. I thought I did, but I only cut
and pasted the subject into the subject of this email.

commit 77d45b4500967de674b8f75a9a91f58d57d5704d

>
> It looks like the ehci driver is having problems, but first, your
> interrupts are whack:

Could be. It's an old board.

>
> > irq 16: nobody cared (try booting with the "irqpoll" option)
> > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-rc1-test-dirty #24
> > Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> > Call Trace:
> > <IRQ>
> > devtmpfs: mounted
> > dump_stack+0x9a/0xd6
> > __report_bad_irq+0x35/0xc0
> > note_interrupt+0x234/0x270
> > handle_irq_event_percpu+0x45/0x60
> > handle_irq_event+0x39/0x60
> > handle_fasteoi_irq+0x8f/0x160
> > handle_irq+0x6f/0x110
> > do_IRQ+0x46/0xd0
> > common_interrupt+0x93/0x93
> > RIP: 0010:native_safe_halt+0x6/0x10
> > RSP: 0000:ffffb54240cd7e90 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff7e
> > RAX: 0000000000000000 RBX: ffff8ea214498040 RCX: 0000000000000000
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: ffffb54240cd7e90 R08: 0000000000000001 R09: 0000000041129b0c
> > R10: ffffb54240cd7d68 R11: 0000000000000001 R12: 0000000000000002
> > R13: ffff8ea214498040 R14: 0000000000000000 R15: ffff8ea214498040
> > </IRQ>
> > default_idle+0x38/0x160
> > arch_cpu_idle+0xf/0x20
> > default_idle_call+0x28/0x50
> > do_idle+0x182/0x220
> > cpu_startup_entry+0x1d/0x20
> > start_secondary+0x132/0x160
> > secondary_startup_64+0x9f/0x9f
> > handlers:
> > [<ffffffff9a6421a0>] xhci_msi_irq
> > Disabling IRQ #16
>
> Have you tried taking the kernel's advice? :)

You mean the "irqpoll"? No. This works fine without that commit. Why
should I have to change?

>
> > ehci-pci 0000:00:1a.0: new USB bus registered, assigned bus number 3
> > ehci-pci 0000:00:1a.0: debug port 2
> > ehci-pci 0000:00:1a.0: cache line size of 64 is not supported
> > genirq: Flags mismatch irq 16. 00000080 (ehci_hcd:usb3) vs. 00000000 (xhci_hcd)
>
> What does that mean?

No idea ;-)

>
> > CPU: 0 PID: 307 Comm: modprobe Tainted: G E 4.12.0-rc1-test-dirty #24
> > Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> > Call Trace:
> > dump_stack+0x9a/0xd6
> > __setup_irq+0x5d4/0x630
> > request_threaded_irq+0x10d/0x190
> > usb_add_hcd+0x658/0x970
> > ? for_each_companion+0x3e/0xb0
> > usb_hcd_pci_probe+0x3e4/0x490
> > ehci_pci_probe+0x36/0x40 [ehci_pci]
> > local_pci_probe+0x45/0xa0
> > ? pci_match_device+0xca/0x110
> > pci_device_probe+0xdb/0x130
> > driver_probe_device+0x2ed/0x480
> > __driver_attach+0xd5/0x100
> > ? driver_probe_device+0x480/0x480
> > bus_for_each_dev+0x62/0xa0
> > driver_attach+0x1e/0x20
> > bus_add_driver+0x1c6/0x290
> > driver_register+0x60/0xe0
> > __pci_register_driver+0x60/0x70
> > ? 0xffffffffc0346000
> > ehci_pci_init+0x6a/0x1000 [ehci_pci]
> > do_one_initcall+0x43/0x190
> > ? kmem_cache_alloc_trace+0x1be/0x200
> > do_init_module+0x7d/0x210
> > load_module+0x1891/0x1eb0
> > ? vmap_page_range_noflush+0x29b/0x370
> > ? show_coresize+0x30/0x30
> > SYSC_init_module+0x143/0x180
> > ? load_module+0x5/0x1eb0
> > ? SYSC_init_module+0x143/0x180
> > SyS_init_module+0xe/0x10
> > entry_SYSCALL_64_fastpath+0x23/0xc2
> > RIP: 0033:0x3b918e0ffa
> > RSP: 002b:00007ffd11d575c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> > RAX: ffffffffffffffda RBX: 000000000061f950 RCX: 0000003b918e0ffa
> > RDX: 000000000061f7d0 RSI: 00000000000036b0 RDI: 000000000062c9e0
> > RBP: 0000000000000000 R08: 0000000000630090 R09: 00007f019c07c700
> > R10: 00007ffd11d574f0 R11: 0000000000000246 R12: 0000000000626200
> > R13: 000000000061f930 R14: 0000000000000000 R15: 000000000061f420
> > ehci-pci 0000:00:1a.0: request interrupt 16 failed
>
> So ehci can't use the same irq line as xhci? No sharing allowed?
>
> But other configs on this same hardware work, can you do a diff of a
> working vs. not working?

I could probably run my config-bisect and see what it comes up with.

-- Steve