IRQ problem with the AACRAID driver with debian wheezy kernel 3.2.60-1+deb7u3 x86_64

From: AndreaML
Date: Thu Sep 11 2014 - 05:53:10 EST


Hello

I'm writing here after a very long time spent trying to debug a problem i have
in an host i use for backup pourposes. My google-fu haven't helped in this case.
This problem is there *at least* from the start of the year (i don't remember
when it started)

This is a pc I assembled in a 19" rack case and put in my small server farm
to be connected to the tape libraries and to be used as a backup server
with rsync and bacula

This server is equipped with an adaptec 2100S storage raid controller with
4 sata disks for storage and 2 sata disks directly connected to the motherboard
in a md mirror configuration for the system to reside on. I plan to attach a
second 2100S i have around to get more disk space.

The os i have installed is debian wheezy completely updated as of today, with
kernel version "Linux frodo 3.2.0-4-amd64 #1 SMP Debian 3.2.60-1+deb7u3 x86_64 GNU/Linux"

The problem i have is that at least one time every day the kernel complains that it
receives an interrupt #18 with corresponds to the 2100S storage controller that wasn't
serviced by a driver.... as it stands and confirmed by lspci -nnvv this is attached to
aacraid (see below).

The problem is that when the kernel complains about irq #18, my system become
sloooowww... to the point that even a sync on the command line becomes unresponsive
and i need to recurse to the magic systreq trick (echo b > /proc/sysrc_trigger)
or the physical reset button to regain access to the system.

and i'm really really out of ideas on how to resolve this. can you folks
give me some hints?

and yes, i have tried the irqpoll option as suggested in kernel msg with no results.

following is the data i have, in compact form.

-- Andrea --


The motherboard bios and the 2100S bios are at the latest image available
from the manufacturers.

The kernel complains in this manner:
Sep 10 22:01:11 frodo kernel: [215125.931642] irq 18: nobody cared (try booting with the "irqpoll" option)
Sep 10 22:01:11 frodo kernel: [215125.931703] Pid: 0, comm: swapper/0 Not tainted 3.2.0-4-amd64 #1 Debian 3.2.60-1+deb7u3
Sep 10 22:01:11 frodo kernel: [215125.931706] Call Trace:
Sep 10 22:01:11 frodo kernel: [215125.931708] <IRQ> [<ffffffff81092bdd>] ? __report_bad_irq+0x2c/0xb5
Sep 10 22:01:11 frodo kernel: [215125.931720] [<ffffffff81092f9a>] ? note_interrupt+0x170/0x1f2
Sep 10 22:01:11 frodo kernel: [215125.931725] [<ffffffff8109135c>] ? handle_irq_event_percpu+0x15f/0x17d
Sep 10 22:01:11 frodo kernel: [215125.931729] [<ffffffff810913ae>] ? handle_irq_event+0x34/0x52
Sep 10 22:01:11 frodo kernel: [215125.931734] [<ffffffff8106c1ad>] ? arch_local_irq_save+0x11/0x17
Sep 10 22:01:11 frodo kernel: [215125.931738] [<ffffffff81093711>] ? handle_fasteoi_irq+0x7c/0xaf
Sep 10 22:01:11 frodo kernel: [215125.931744] [<ffffffff8100f9e1>] ? handle_irq+0x1d/0x21
Sep 10 22:01:11 frodo kernel: [215125.931748] [<ffffffff8100f5ba>] ? do_IRQ+0x42/0x98
Sep 10 22:01:11 frodo kernel: [215125.931753] [<ffffffff8135072e>] ? common_interrupt+0x6e/0x6e
Sep 10 22:01:11 frodo kernel: [215125.931755] <EOI> [<ffffffff81066875>] ? ktime_get+0x50/0x86
Sep 10 22:01:11 frodo kernel: [215125.931763] [<ffffffff811ef371>] ? intel_idle+0xea/0x119
Sep 10 22:01:11 frodo kernel: [215125.931767] [<ffffffff811ef350>] ? intel_idle+0xc9/0x119
Sep 10 22:01:11 frodo kernel: [215125.931771] [<ffffffff8127112d>] ? cpuidle_idle_call+0xec/0x179
Sep 10 22:01:11 frodo kernel: [215125.931775] [<ffffffff8100d242>] ? cpu_idle+0xa5/0xf2
Sep 10 22:01:11 frodo kernel: [215125.931780] [<ffffffff816abb36>] ? start_kernel+0x3b8/0x3c3
Sep 10 22:01:11 frodo kernel: [215125.931785] [<ffffffff816ab140>] ? early_idt_handlers+0x140/0x140
Sep 10 22:01:11 frodo kernel: [215125.931789] [<ffffffff816ab3c4>] ? x86_64_start_kernel+0x104/0x111
Sep 10 22:01:11 frodo kernel: [215125.931792] handlers:
Sep 10 22:01:11 frodo kernel: [215125.931824] [<ffffffffa013689c>] aac_rx_intr_message
Sep 10 22:01:11 frodo kernel: [215125.931864] Disabling IRQ #18

and after a while (obviously)
Sep 11 00:44:13 frodo kernel: [224892.536925] INFO: task rsync:7383 blocked for more than 120 seconds.
Sep 11 00:44:13 frodo kernel: [224892.536971] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 11 00:44:13 frodo kernel: [224892.537023] rsync D ffff88011ed93780 0 7383 7363 0x00000000
Sep 11 00:44:13 frodo kernel: [224892.537028] ffff8801169150c0 0000000000000082 ffffffff00000000 ffff880119f55100
Sep 11 00:44:13 frodo kernel: [224892.537033] 0000000000013780 ffff88002d907fd8 ffff88002d907fd8 ffff8801169150c0
Sep 11 00:44:13 frodo kernel: [224892.537038] ffff88002d907c74 0000000100000004 ffffffff8135049f ffff88010dca2210
Sep 11 00:44:13 frodo kernel: [224892.537042] Call Trace:
Sep 11 00:44:13 frodo kernel: [224892.537050] [<ffffffff8135049f>] ? _raw_spin_unlock_irqrestore+0xe/0xf
Sep 11 00:44:13 frodo kernel: [224892.537055] [<ffffffff8134fa44>] ? __mutex_lock_common.isra.5+0xff/0x164
Sep 11 00:44:13 frodo kernel: [224892.537059] [<ffffffff8134f932>] ? mutex_lock+0x1a/0x2d
Sep 11 00:44:13 frodo kernel: [224892.537063] [<ffffffff81103868>] ? walk_component+0x1f4/0x406
Sep 11 00:44:13 frodo kernel: [224892.537069] [<ffffffff810b6949>] ? generic_file_aio_read+0x570/0x5cf
Sep 11 00:44:13 frodo kernel: [224892.537072] [<ffffffff81104721>] ? path_lookupat+0x7c/0x2bd
Sep 11 00:44:13 frodo kernel: [224892.537076] [<ffffffff81036628>] ? should_resched+0x5/0x23
Sep 11 00:44:13 frodo kernel: [224892.537079] [<ffffffff8134f144>] ? _cond_resched+0x7/0x1c
Sep 11 00:44:13 frodo kernel: [224892.537083] [<ffffffff8110497e>] ? do_path_lookup+0x1c/0x87
Sep 11 00:44:13 frodo kernel: [224892.537087] [<ffffffff81106407>] ? user_path_at_empty+0x47/0x7b
Sep 11 00:44:13 frodo kernel: [224892.537091] [<ffffffff810fa820>] ? do_sync_read+0xb4/0xec
Sep 11 00:44:13 frodo kernel: [224892.537095] [<ffffffff810958f2>] ? force_quiescent_state+0x19/0x178
Sep 11 00:44:13 frodo kernel: [224892.537098] [<ffffffff810fe45a>] ? vfs_fstatat+0x32/0x60
Sep 11 00:44:13 frodo kernel: [224892.537101] [<ffffffff810fbc8a>] ? fput+0x17a/0x1a1
Sep 11 00:44:13 frodo kernel: [224892.537104] [<ffffffff810fe5bb>] ? sys_newlstat+0x12/0x2b
Sep 11 00:44:13 frodo kernel: [224892.537107] [<ffffffff81110b07>] ? mntput_no_expire+0x1e/0xc9
Sep 11 00:44:13 frodo kernel: [224892.537111] [<ffffffff810f9934>] ? filp_close+0x62/0x6a
Sep 11 00:44:13 frodo kernel: [224892.537114] [<ffffffff810f99ca>] ? sys_close+0x8e/0xcb
Sep 11 00:44:13 frodo kernel: [224892.537117] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b

--- kernel driver version
root@frodo:~# cat /sys/bus/pci/drivers/aacraid/module/version
1.1-7[28000]-ms

--- excerpt from hwinfo
str1: "ASUSTeK COMPUTER INC."
str2: "P8H61"
str2: "Intel"
str3: "Intel(R) Core(TM) i3-2100T CPU @ 2.50GHz"

--- excerpt from lspci -nnvv
05:02.0 RAID bus controller [0104]: Adaptec AAC-RAID [9005:0285] (rev 01)
Subsystem: Adaptec AAR-2410SA PCI SATA 4ch (Jaguar II) [9005:0290]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32 (250ns min, 250ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at f0000000 (32-bit, prefetchable) [size=64M]
Expansion ROM at f7d30000 [disabled] [size=32K]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: aacraid



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/