Re: pinctrl_cherryview: massive error messages (handle_bad_irq)

From: Mika Westerberg
Date: Thu Oct 20 2016 - 06:16:59 EST


+LinusW

On Wed, Oct 19, 2016 at 08:40:25PM +0000, Christian Steiner wrote:
> On 19.10.2016 21:22, Mika Westerberg wrote:
> > On Wed, Oct 19, 2016 at 06:32:57PM +0000, Christian Steiner wrote:
> >> Hello,
> >>
> >> since Kernel 4.8 I see a lot of the following error messages after
> >> resuming from suspend to RAM:
> >>
> >>> irq 117, desc: ffff88017a61e600, depth: 1, count: 0, unhandled: 0
> >>> ->handle_irq(): ffffffff8109b613, handle_bad_irq+0x0/0x1e0
> >>> ->irq_data.chip(): ffffffffa0020180, chv_pinctrl_exit+0x2d84/0x12 [pinctrl_cherryview]
> >>> ->action(): (null)
> >>> IRQ_NOPROBE set
> >>
> >> The messages appear a few thousand times and mess up the log. Despite
> >> the messages everything seems to work fine.
> >> I see the messages with Kernel 4.8.2 and 4.9-rc1. The do not appear with
> >> Kernel 4.7.8 or earlier.
> >
> > We changed the driver not to mask all IRQs when it starts. See commit
> > bcb48cca23ec ("pinctrl: cherryview: Do not mask all interrupts in
> > probe").
> >
> >> I have an Acer TravelMate B116-M laptop with an Intel Pentium N3700
> >> processor.
> >>
> >> Here are the full log messages of a suspend to RAM and resume cycle with
> >> Kernel 4.7.8:
> >>
> >>> [ 22.605418] PM: Syncing filesystems ... done.
> >>> [ 22.613640] PM: Preparing system for sleep (mem)
> >>> [ 22.615108] Freezing user space processes ... (elapsed 0.001 seconds) done.
> >>> [ 22.616930] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
> >>> [ 22.616956] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >>> [ 22.618611] PM: Suspending system (mem)
> >>> [ 22.618640] Suspending console(s) (use no_console_suspend to debug)
> >>> [ 22.619773] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> >>> [ 22.626173] sd 0:0:0:0: [sda] Stopping disk
> >>> [ 23.670076] PM: suspend of devices complete after 1051.008 msecs
> >>> [ 23.691058] PM: late suspend of devices complete after 20.971 msecs
> >>> [ 23.692104] r8169 0000:03:00.0: System wakeup enabled by ACPI
> >>> [ 23.693087] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
> >>> [ 23.711687] PM: noirq suspend of devices complete after 20.624 msecs
> >>> [ 23.711726] ACPI: Preparing to enter system sleep state S3
> >>> [ 23.712295] ACPI : EC: EC stopped
> >>> [ 23.712297] PM: Saving platform NVS memory
> >>> [ 23.712306] Disabling non-boot CPUs ...
> >>> [ 23.712590] Cannot set affinity for irq 191
> >>> [ 23.713647] smpboot: CPU 1 is now offline
> >>> [ 23.752643] smpboot: CPU 2 is now offline
> >>> [ 23.802930] smpboot: CPU 3 is now offline
> >>> [ 23.831827] ACPI: Low-level resume complete
> >>> [ 23.831938] ACPI : EC: EC started
> >>> [ 23.831939] PM: Restoring platform NVS memory
> >>> [ 23.832600] Enabling non-boot CPUs ...
> >>> [ 23.882819] x86: Booting SMP configuration:
> >>> [ 23.882821] smpboot: Booting Node 0 Processor 1 APIC 0x2
> >>> [ 23.885308] cache: parent cpu1 should not be sleeping
> >>> [ 23.885757] CPU1 is up
> >>> [ 23.932612] smpboot: Booting Node 0 Processor 2 APIC 0x4
> >>> [ 23.935171] cache: parent cpu2 should not be sleeping
> >>> [ 23.935674] CPU2 is up
> >>> [ 23.982749] smpboot: Booting Node 0 Processor 3 APIC 0x6
> >>> [ 23.985324] cache: parent cpu3 should not be sleeping
> >>> [ 23.985840] CPU3 is up
> >>> [ 23.986370] ACPI: Waking up from system sleep state S3
> >>> [ 24.012307] xhci_hcd 0000:00:14.0: System wakeup disabled by ACPI
> >>> [ 24.023417] PM: noirq resume of devices complete after 20.911 msecs
> >>> [ 25.052523] PM: early resume of devices complete after 1029.024 msecs
> >>> [ 25.053082] r8169 0000:03:00.0: System wakeup disabled by ACPI
> >>> [ 25.053124] iwlwifi 0000:02:00.0: RF_KILL bit toggled to disable radio.
> >>> [ 25.054222] rtc_cmos 00:04: System wakeup disabled by ACPI
> >>> [ 25.055645] sd 0:0:0:0: [sda] Starting disk
> >>> [ 25.075031] r8169 0000:03:00.0 eth0: link down
> >>> [ 25.374179] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> >>> [ 25.377616] ata1.00: configured for UDMA/133
> >>> [ 26.025736] PM: resume of devices complete after 973.238 msecs
> >>> [ 26.026109] PM: Finishing wakeup.
> >>> [ 26.026111] Restarting tasks ... done.
> >>> [ 27.767611] r8169 0000:03:00.0 eth0: link up
> >>
> >> And this is how it looks with Kernel 4.8.2:
> >>
> >>> [ 46.928231] PM: Syncing filesystems ... done.
> >>> [ 46.930391] PM: Preparing system for sleep (mem)
> >>> [ 46.932002] Freezing user space processes ... (elapsed 0.001 seconds) done.
> >>> [ 46.933721] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
> >>> [ 46.933749] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >>> [ 46.935457] PM: Suspending system (mem)
> >>> [ 46.935486] Suspending console(s) (use no_console_suspend to debug)
> >>> [ 46.936492] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> >>> [ 46.936598] sd 0:0:0:0: [sda] Stopping disk
> >>> [ 48.026353] PM: suspend of devices complete after 1090.448 msecs
> >>> [ 48.057844] PM: late suspend of devices complete after 31.480 msecs
> >>> [ 48.059086] r8169 0000:03:00.0: System wakeup enabled by ACPI
> >>> [ 48.059240] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
> >>> [ 48.117234] PM: noirq suspend of devices complete after 59.386 msecs
> >>> [ 48.117269] ACPI: Preparing to enter system sleep state S3
> >>> [ 48.117836] ACPI : EC: EC stopped
> >>> [ 48.117837] PM: Saving platform NVS memory
> >>> [ 48.117846] Disabling non-boot CPUs ...
> >>> [ 48.118130] Cannot set affinity for irq 191
> >>> [ 48.119186] smpboot: CPU 1 is now offline
> >>> [ 48.179126] smpboot: CPU 2 is now offline
> >>> [ 48.229450] smpboot: CPU 3 is now offline
> >>> [ 48.278338] ACPI: Low-level resume complete
> >>> [ 48.278415] ACPI : EC: EC started
> >>> [ 48.278416] PM: Restoring platform NVS memory
> >>> [ 48.279036] irq 117, desc: ffff88017a61e600, depth: 1, count: 0, unhandled: 0
> >>> [ 48.279044] ->handle_irq(): ffffffff8109b613, handle_bad_irq+0x0/0x1e0
> >>> [ 48.279052] ->irq_data.chip(): ffffffffa0020180, chv_pinctrl_exit+0x2d84/0x12 [pinctrl_cherryview]
> >>> [ 48.279054] ->action(): (null)
> >>> [ 48.279055] IRQ_NOPROBE set
> >>> [ 48.279057] unexpected IRQ trap at vector 75
> >>>
> >> [...]
> >>>
> >>> [ 48.659033] irq 117, desc: ffff88017a61e600, depth: 1, count: 0, unhandled: 0
> >>> [ 48.659037] ->handle_irq(): ffffffff8109b613, handle_bad_irq+0x0/0x1e0
> >>> [ 48.659041] ->irq_data.chip(): ffffffffa0020180, chv_pinctrl_exit+0x2d84/0x12 [pinctrl_cherryview]
> >>> [ 48.659042] ->action(): (null)
> >>> [ 48.659043] IRQ_NOPROBE set
> >>> [ 49.698698] PM: early resume of devices complete after 1089.068 msecs
> >>> [ 49.699601] r8169 0000:03:00.0: System wakeup disabled by ACPI
> >>> [ 49.699633] iwlwifi 0000:02:00.0: RF_KILL bit toggled to disable radio.
> >>> [ 49.708346] sd 0:0:0:0: [sda] Starting disk
> >>> [ 49.708581] rtc_cmos 00:04: System wakeup disabled by ACPI
> >>> [ 49.731492] r8169 0000:03:00.0 eth0: link down
> >>> [ 50.039718] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> >>> [ 50.041544] ata1.00: configured for UDMA/133
> >>> [ 50.381833] PM: resume of devices complete after 683.151 msecs
> >>> [ 50.385539] PM: Finishing wakeup.
> >>> [ 50.385541] Restarting tasks ... done.
> >>
> >> Are the messages just informational or is this something serious that
> >> should be fixed?
> >
> > Basically it means the machine has an interrupt line active an nothing
> > is handling it. It is not serious but rather annoying.
>
> Thank you. Good to hear it is not critical. I would ignore the messages,
> but it is annoying that almost 4MB of log messages are generated on
> every resume after standby. Maybe it is possible to define a limit for
> the maximum amount of the same messages?

The kernel will disable that line after 100000 spuriuos interrupts but I
guess it is still rather annoying for the user.

> > I've seen this happening on proto hardware but it was fixed with
> > firmware update.
>
> Currently I am using BIOS v1.22. Recently Acer released v1.23. I could
> test if the BIOS update solves the problem.

Please try that first.

If that does not work, then I we may be able to mask all interrupts up
to nirqs - 1 but I first would like to check if the BIOS upgrade helps.

> >> If you need more information or if I should test a patch, please let me
> >> know.
> >
> > Can you send me output of /proc/interrupts and
> > /sys/kernel/debug/pinctrl/INT33FF:*/pins.
>
> Of course:
>
> > $ cat /proc/interrupts
> > CPU0 CPU1 CPU2 CPU3
> > 0: 1388 0 0 0 IO-APIC 2-edge timer
> > 1: 17 3 4 3 IO-APIC 1-edge i8042
> > 8: 1 0 0 0 IO-APIC 8-edge rtc0
> > 9: 28 23 22 23 IO-APIC 9-fasteoi acpi
> > 12: 76 36 36 36 IO-APIC 12-edge i8042
> > 18: 0 0 0 0 IO-APIC 18-fasteoi i801_smbus
> > 45: 13 17 9 13 IO-APIC 45-fasteoi mmc0
> > 47: 0 0 0 0 IO-APIC 47-fasteoi mmc1
> > 115: 7816 7632 7694 7549 PCI-MSI 311296-edge ahci[0000:00:13.0]
> > 116: 43793 44661 43670 44402 PCI-MSI 1572864-edge eth0
> > 117: 10759 0 0 0 chv-gpio 0
> > 191: 0 0 0 0 chv-gpio 50 80860F14:01 cd
> > 196: 35 37 40 41 PCI-MSI 1048576-edge iwlwifi
> > 197: 24359 22517 23054 22852 PCI-MSI 327680-edge xhci_hcd
> > 198: 174515 168035 165519 167706 PCI-MSI 32768-edge i915
> > 199: 20 18 18 16 PCI-MSI 442368-edge snd_hda_intel:card0
> > NMI: 0 0 0 0 Non-maskable interrupts
> > LOC: 409748 405619 425239 440549 Local timer interrupts
> > SPU: 0 0 0 0 Spurious interrupts
> > PMI: 0 0 0 0 Performance monitoring interrupts
> > IWI: 0 0 0 0 IRQ work interrupts
> > RTR: 0 0 0 0 APIC ICR read retries
> > RES: 74598 69435 68734 60007 Rescheduling interrupts
> > CAL: 23842 28885 29873 26859 Function call interrupts
> > TLB: 17407 23763 25060 21008 TLB shootdowns
> > TRM: 0 0 0 0 Thermal event interrupts
> > THR: 0 0 0 0 Threshold APIC interrupts
> > MCE: 0 0 0 0 Machine check exceptions
> > MCP: 45 44 44 44 Machine check polls
> > ERR: 0
> > MIS: 0
> > PIN: 0 0 0 0 Posted-interrupt notification event
> > PIW: 0 0 0 0 Posted-interrupt wakeup event
>
> > $ cat /sys/kernel/debug/pinctrl/INT33FF:*/pin
> > registered pins: 24
> > pin 0 (PMU_SLP_S3_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 1 (PMU_BATLOW_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 2 (SUS_STAT_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 3 (PMU_SLP_S0IX_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 4 (PMU_AC_PRESENT) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 5 (PMU_PLTRST_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 6 (PMU_SUSCLK) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
> > pin 7 (PMU_SLP_LAN_B) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
> > pin 8 (PMU_PWRBTN_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 9 (PMU_SLP_S4_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 10 (PMU_WAKE_B) mode 1 ctrl0 0x00c10001 ctrl1 0x04c00000
> > pin 11 (PMU_WAKE_LAN_B) GPIO ctrl0 0x00018300 ctrl1 0x04c00000
> > pin 15 (MF_ISH_GPIO_3) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
> > pin 16 (MF_ISH_GPIO_7) GPIO ctrl0 0x00018102 ctrl1 0x04c00000
> > pin 17 (MF_ISH_I2C1_SCL) GPIO ctrl0 0x00018102 ctrl1 0x04c00000
> > pin 18 (MF_ISH_GPIO_1) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
> > pin 19 (MF_ISH_GPIO_5) GPIO ctrl0 0x10008201 ctrl1 0x04c00004

It most probably is this one. Active high and the line is high so this
is generating interrupts all the time if not masked.

> > pin 20 (MF_ISH_GPIO_9) GPIO ctrl0 0x00018102 ctrl1 0x04c00000
> > pin 21 (MF_ISH_GPIO_0) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
> > pin 22 (MF_ISH_GPIO_4) GPIO ctrl0 0x00018102 ctrl1 0x04c00000
> > pin 23 (MF_ISH_GPIO_8) GPIO ctrl0 0x00018102 ctrl1 0x04c00000
> > pin 24 (MF_ISH_GPIO_2) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
> > pin 25 (MF_ISH_GPIO_6) GPIO ctrl0 0x00008201 ctrl1 0x04c00002
> > pin 26 (MF_ISH_I2C1_SDA) GPIO ctrl0 0x00018102 ctrl1 0x04c00000
> > registered pins: 55
> > pin 0 (MF_PLT_CLK0) GPIO ctrl0 0x00008200 ctrl1 0x04c00000
> > pin 1 (PWM1) GPIO ctrl0 0x00018102 ctrl1 0x04c00000
> > pin 2 (MF_PLT_CLK1) GPIO ctrl0 0x00008200 ctrl1 0x04c00000
> > pin 3 (MF_PLT_CLK4) GPIO ctrl0 0x00008200 ctrl1 0x04c00000
> > pin 4 (MF_PLT_CLK3) GPIO ctrl0 0x00008200 ctrl1 0x04c00000
> > pin 5 (PWM0) GPIO ctrl0 0x00018102 ctrl1 0x04c00000
> > pin 6 (MF_PLT_CLK5) GPIO ctrl0 0x00008200 ctrl1 0x04c00000
> > pin 7 (MF_PLT_CLK2) GPIO ctrl0 0x00008201 ctrl1 0x04c00000
> > pin 15 (SDMMC2_D3_CD_B) mode 1 ctrl0 0x00010081 ctrl1 0x04c00000
> > pin 16 (SDMMC1_CLK) mode 1 ctrl0 0x00110080 ctrl1 0x04c00000
> > pin 17 (SDMMC1_D0) mode 1 ctrl0 0x00910081 ctrl1 0x04c00000
> > pin 18 (SDMMC2_D1) mode 1 ctrl0 0x00010081 ctrl1 0x04c00000
> > pin 19 (SDMMC2_CLK) mode 1 ctrl0 0x00010080 ctrl1 0x04c00000
> > pin 20 (SDMMC1_D2) mode 1 ctrl0 0x00910081 ctrl1 0x04c00000
> > pin 21 (SDMMC2_D2) mode 1 ctrl0 0x00010081 ctrl1 0x04c00000
> > pin 22 (SDMMC2_CMD) mode 1 ctrl0 0x00010081 ctrl1 0x04c00000
> > pin 23 (SDMMC1_CMD) mode 1 ctrl0 0x00910081 ctrl1 0x04c00000
> > pin 24 (SDMMC1_D1) mode 1 ctrl0 0x00910081 ctrl1 0x04c00000
> > pin 25 (SDMMC2_D0) mode 1 ctrl0 0x00010081 ctrl1 0x04c00000
> > pin 26 (SDMMC1_D3_CD_B) mode 1 ctrl0 0x00910081 ctrl1 0x04c00000
> > pin 30 (SDMMC3_D1) mode 1 ctrl0 0x00910080 ctrl1 0x04c00000
> > pin 31 (SDMMC3_CLK) mode 1 ctrl0 0x00010080 ctrl1 0x04c00000
> > pin 32 (SDMMC3_D3) mode 1 ctrl0 0x00910080 ctrl1 0x04c00000
> > pin 33 (SDMMC3_D2) mode 1 ctrl0 0x00910080 ctrl1 0x04c00000
> > pin 34 (SDMMC3_CMD) mode 1 ctrl0 0x00910080 ctrl1 0x04c00000
> > pin 35 (SDMMC3_D0) mode 1 ctrl0 0x00910080 ctrl1 0x04c00000
> > pin 45 (MF_LPC_AD2) mode 1 ctrl0 0x00c10001 ctrl1 0x04c00000
> > pin 46 (LPC_CLKRUNB) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 47 (MF_LPC_AD0) mode 1 ctrl0 0x00c10001 ctrl1 0x04c00000
> > pin 48 (LPC_FRAMEB) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 49 (MF_LPC_CLKOUT1) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 50 (MF_LPC_AD3) mode 1 ctrl0 0x00c10001 ctrl1 0x04c00000
> > pin 51 (MF_LPC_CLKOUT0) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
> > pin 52 (MF_LPC_AD1) mode 1 ctrl0 0x00c10001 ctrl1 0x04c00000
> > pin 60 (SPI1_MISO) mode 1 ctrl0 0x00910301 ctrl1 0x05c00000
> > pin 61 (SPI1_CSO_B) GPIO ctrl0 0x00018100 ctrl1 0x04c00000
> > pin 62 (SPI1_CLK) GPIO ctrl0 0x00018100 ctrl1 0x04c00000
> > pin 63 (MMC1_D6) mode 1 ctrl0 0x00910081 ctrl1 0x04c00000
> > pin 64 (SPI1_MOSI) GPIO ctrl0 0x00018102 ctrl1 0x04c00000
> > pin 65 (MMC1_D5) mode 1 ctrl0 0x00910081 ctrl1 0x04c00000
> > pin 66 (SPI1_CS1_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 67 (MMC1_D4_SD_WE) mode 1 ctrl0 0x00910081 ctrl1 0x04c00000
> > pin 68 (MMC1_D7) mode 1 ctrl0 0x00910081 ctrl1 0x04c00000
> > pin 69 (MMC1_RCLK) mode 1 ctrl0 0x00010081 ctrl1 0x04c00000
> > pin 75 (USB_OC1_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 76 (PMU_RESETBUTTON_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 77 (GPIO_ALERT) GPIO ctrl0 0x00018102 ctrl1 0x04c00000
> > pin 78 (SDMMC3_PWR_EN_B) mode 1 ctrl0 0x00110001 ctrl1 0x04c00000
> > pin 79 (ILB_SERIRQ) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 80 (USB_OC0_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
> > pin 81 (SDMMC3_CD_B) GPIO ctrl0 0x00918201 ctrl1 0x04c00003
> > pin 82 (SPKR) mode 1 ctrl0 0x00910300 ctrl1 0x05c00000
> > pin 83 (SUSPWRDNACK) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
> > pin 84 (SPARE_PIN) mode 1 ctrl0 0x00110300 ctrl1 0x05c00000
> > pin 85 (SDMMC3_1P8_EN) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
>
> Thanks,
> Christian
>
> PS: Sorry for double-posting on the previous post. It was a mistake.