Re: HDD problem, software bug, bios bug, or hardware ?

From: Adko Branil
Date: Fri Sep 07 2012 - 07:32:58 EST


After updating bios no more crashes happened, i tested it many times on heavy HDD IO loads, with many kernels (including CONFIG_PREEMPT kernels). But now if enable "Cool'n' Quiet" option in bios,  CONFIG_PREEMPT_VOLUNTARY kernel with passed "nosmp" at boot time, crashes during boot process with kernel panic, while  CONFIG_PREEMPT kernlel without "nosmp" works fine  - but it is another story i think, should not be related with the crashes when it was old bios, and i think it is probably "nosmp" the reason. (i have never changed cpu frequency of this cpu at all) When "Cool'n' Quiet" is disabled, the system works perfectly adequately with all kind of kernels i tried. Except that this warning message in dmesg still appears (if it is problem at all). I put here this message for "nosmp" case as well, kernel is 3.5.2:





[    1.912494] =================================
[    1.912494] [ INFO: inconsistent lock state ]
[    1.912494] 3.5.2 #4 Not tainted
[    1.912494] ---------------------------------
[    1.912494] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[    1.912494] swapper/0/1 [HC1[1]:SC1[1]:HE0:SE0] takes:
[    1.912494]  (&(&host->lock)->rlock){?.+...}, at: [<ffffffff818f4e47>] ata_bmdma_interrupt+0x27/0x1d0
[    1.912494] {HARDIRQ-ON-W} state was registered at:
[    1.912494]   [<ffffffff810998fb>] __lock_acquire+0x61b/0x1af0
[    1.912494]   [<ffffffff8109b31a>] lock_acquire+0x8a/0x110
[    1.912494]   [<ffffffff81b4d051>] _raw_spin_lock+0x31/0x40
[    1.912494]   [<ffffffff8190b3c5>] pdc_sata_hardreset+0x85/0x100
[    1.912494]   [<ffffffff818eabba>] ata_do_reset+0x3a/0x90
[    1.912494]   [<ffffffff818edd72>] ata_eh_reset+0x372/0xe00
[    1.912494]   [<ffffffff818eec25>] ata_eh_recover+0x2a5/0x13d0
[    1.912494]   [<ffffffff818f073d>] ata_do_eh+0x4d/0xb0
[    1.912494]   [<ffffffff818f33ba>] ata_sff_error_handler+0xca/0x120
[    1.912494]   [<ffffffff8190a9e4>] pdc_error_handler+0x24/0x30
[    1.912494]   [<ffffffff818f029c>] ata_scsi_port_error_handler+0x47c/0x800
[    1.912494]   [<ffffffff818f06be>] ata_scsi_error+0x9e/0xd0
[    1.912494]   [<ffffffff816732e8>] scsi_error_handler+0xf8/0x500
[    1.912494]   [<ffffffff810654fe>] kthread+0xae/0xc0
[    1.912494]   [<ffffffff81b4f5f4>] kernel_thread_helper+0x4/0x10
[    1.912494] irq event stamp: 661637
[    1.912494] hardirqs last  enabled at (661636): [<ffffffff81049ff1>] __do_softirq+0x71/0x1f0
[    1.912494] hardirqs last disabled at (661637): [<ffffffff81b4da67>] common_interrupt+0x67/0x6c
[    1.912494] softirqs last  enabled at (661610): [<ffffffff8104a0b4>] __do_softirq+0x134/0x1f0
[    1.912494] softirqs last disabled at (661635): [<ffffffff81b4f6ec>] call_softirq+0x1c/0x30
[    1.912494]
[    1.912494] other info that might help us debug this:
[    1.912494]  Possible unsafe locking scenario:
[    1.912494]
[    1.912494]        CPU0
[    1.912494]        ----
[    1.912494]   lock(&(&host->lock)->rlock);
[    1.912494]   <Interrupt>
[    1.912494]     lock(&(&host->lock)->rlock);
[    1.912494]
[    1.912494]  *** DEADLOCK ***
[    1.912494]
[    1.912494] 5 locks held by swapper/0/1:
[    1.912494]  #0:  (&__lockdep_no_validate__){......}, at: [<ffffffff81636b1b>] __driver_attach+0x5b/0xb0
[    1.912494]  #1:  (&__lockdep_no_validate__){......}, at: [<ffffffff81636b29>] __driver_attach+0x69/0xb0
[    1.912494]  #2:  (usb_bus_list_lock){+.+.+.}, at: [<ffffffff81954f95>] usb_add_hcd+0x295/0x6a0
[    1.912494]  #3:  (&__lockdep_no_validate__){......}, at: [<ffffffff816367ea>] device_attach+0x2a/0xc0
[    1.912494]  #4:  (&__lockdep_no_validate__){......}, at: [<ffffffff816367ea>] device_attach+0x2a/0xc0
[    1.912494]
[    1.912494] stack backtrace:
[    1.912494] Pid: 1, comm: swapper/0 Not tainted 3.5.2 #4
[    1.912494] Call Trace:
[    1.912494]  <IRQ>  [<ffffffff81b35961>] print_usage_bug+0x1f7/0x208
[    1.912494]  [<ffffffff8101001f>] ? save_stack_trace+0x2f/0x50
[    1.912494]  [<ffffffff81098730>] ? print_shortest_lock_dependencies+0x1d0/0x1d0
[    1.912494]  [<ffffffff810992a2>] mark_lock+0x262/0x2a0
[    1.912494]  [<ffffffff810995b5>] ? __lock_acquire+0x2d5/0x1af0
[    1.912494]  [<ffffffff81099af3>] __lock_acquire+0x813/0x1af0
[    1.912494]  [<ffffffff810995b5>] ? __lock_acquire+0x2d5/0x1af0
[    1.912494]  [<ffffffff8109b31a>] lock_acquire+0x8a/0x110
[    1.912494]  [<ffffffff818f4e47>] ? ata_bmdma_interrupt+0x27/0x1d0
[    1.912494]  [<ffffffff81079258>] ? cpuacct_charge+0xa8/0xf0
[    1.912494]  [<ffffffff81b4d151>] _raw_spin_lock_irqsave+0x41/0x60
[    1.912494]  [<ffffffff818f4e47>] ? ata_bmdma_interrupt+0x27/0x1d0
[    1.912494]  [<ffffffff818f4e47>] ata_bmdma_interrupt+0x27/0x1d0
[    1.912494]  [<ffffffff810061f2>] ? mask_and_ack_8259A+0x32/0x110
[    1.912494]  [<ffffffff810c2d5d>] handle_irq_event_percpu+0x5d/0x1f0
[    1.912494]  [<ffffffff810c2f38>] handle_irq_event+0x48/0x70
[    1.912494]  [<ffffffff8100622e>] ? mask_and_ack_8259A+0x6e/0x110
[    1.912494]  [<ffffffff810c5a3e>] ? handle_level_irq+0x1e/0xc0
[    1.912494]  [<ffffffff810c5a91>] handle_level_irq+0x71/0xc0
[    1.912494]  [<ffffffff81003ce2>] handle_irq+0x22/0x40
[    1.912494]  [<ffffffff81b4fdaa>] do_IRQ+0x5a/0xe0
[    1.912494]  [<ffffffff81b4da6c>] common_interrupt+0x6c/0x6c
[    1.912494]  [<ffffffff810c2f43>] ? handle_irq_event+0x53/0x70
[    1.912494]  [<ffffffff81049ff9>] ? __do_softirq+0x79/0x1f0
[    1.912494]  [<ffffffff81b4f6ec>] call_softirq+0x1c/0x30
[    1.912494]  [<ffffffff81003d85>] do_softirq+0x85/0xc0
[    1.912494]  [<ffffffff8104a425>] irq_exit+0xb5/0xc0
[    1.912494]  [<ffffffff81b4fdb3>] do_IRQ+0x63/0xe0
[    1.912494]  [<ffffffff81b4da6c>] common_interrupt+0x6c/0x6c
[    1.912494]  <EOI>  [<ffffffff8104378b>] ? vprintk_emit+0x16b/0x4c0
[    1.912494]  [<ffffffff81b34815>] printk_emit+0x31/0x33
[    1.912494]  [<ffffffff81138f64>] ? kfree+0xd4/0x160
[    1.912494]  [<ffffffff81632c97>] __dev_printk+0x127/0x240
[    1.912494]  [<ffffffff81138f25>] ? kfree+0x95/0x160
[    1.912494]  [<ffffffff8195818f>] ? usb_control_msg+0xef/0x130
[    1.912494]  [<ffffffff8109bd25>] ? trace_hardirqs_on_caller+0x105/0x190
[    1.912494]  [<ffffffff8109bdbd>] ? trace_hardirqs_on+0xd/0x10
[    1.912494]  [<ffffffff81632e03>] _dev_info+0x53/0x60
[    1.912494]  [<ffffffff8195159a>] hub_probe+0x3ea/0x850
[    1.912494]  [<ffffffff81b4b3de>] ? mutex_unlock+0xe/0x10
[    1.912494]  [<ffffffff8195b494>] usb_probe_interface+0x184/0x230
[    1.912494]  [<ffffffff8163692e>] driver_probe_device+0x7e/0x210
[    1.912494]  [<ffffffff81636b70>] ? __driver_attach+0xb0/0xb0
[    1.912494]  [<ffffffff81636bbb>] __device_attach+0x4b/0x60
[    1.912494]  [<ffffffff81634a4e>] bus_for_each_drv+0x4e/0xa0
[    1.912494]  [<ffffffff81636867>] device_attach+0xa7/0xc0
[    1.912494]  [<ffffffff81635ce0>] bus_probe_device+0xb0/0xe0
[    1.912494]  [<ffffffff8163404d>] device_add+0x5cd/0x6a0
[    1.912494]  [<ffffffff8195977e>] usb_set_configuration+0x4be/0x710
[    1.912494]  [<ffffffff81962fb3>] generic_probe+0x43/0xa0
[    1.912494]  [<ffffffff8195b56f>] usb_probe_device+0x2f/0x60
[    1.912494]  [<ffffffff8163692e>] driver_probe_device+0x7e/0x210
[    1.912494]  [<ffffffff81636b70>] ? __driver_attach+0xb0/0xb0
[    1.912494]  [<ffffffff81636bbb>] __device_attach+0x4b/0x60
[    1.912494]  [<ffffffff81634a4e>] bus_for_each_drv+0x4e/0xa0
[    1.912494]  [<ffffffff81636867>] device_attach+0xa7/0xc0
[    1.912494]  [<ffffffff81635ce0>] bus_probe_device+0xb0/0xe0
[    1.912494]  [<ffffffff8163404d>] device_add+0x5cd/0x6a0
[    1.912494]  [<ffffffff81951bdc>] usb_new_device+0x1dc/0x2a0
[    1.912494]  [<ffffffff8195506b>] usb_add_hcd+0x36b/0x6a0
[    1.912494]  [<ffffffff819641a9>] usb_hcd_pci_probe+0x249/0x3c0
[    1.912494]  [<ffffffff815a629f>] pci_device_probe+0xaf/0x130
[    1.912494]  [<ffffffff8163692e>] driver_probe_device+0x7e/0x210
[    1.912494]  [<ffffffff81636b6b>] __driver_attach+0xab/0xb0
[    1.912494]  [<ffffffff81636ac0>] ? driver_probe_device+0x210/0x210
[    1.912494]  [<ffffffff81634af5>] bus_for_each_dev+0x55/0x90
[    1.912494]  [<ffffffff82162f4d>] ? ohci_hcd_mod_init+0x54/0x54
[    1.912494]  [<ffffffff8163629e>] driver_attach+0x1e/0x20
[    1.912494]  [<ffffffff81635ff8>] bus_add_driver+0x1a8/0x270
[    1.912494]  [<ffffffff82162f4d>] ? ohci_hcd_mod_init+0x54/0x54
[    1.912494]  [<ffffffff81637217>] driver_register+0x77/0x150
[    1.912494]  [<ffffffff82162f4d>] ? ohci_hcd_mod_init+0x54/0x54
[    1.912494]  [<ffffffff815a4fff>] __pci_register_driver+0x6f/0xe0
[    1.912494]  [<ffffffff82162f4d>] ? ohci_hcd_mod_init+0x54/0x54
[    1.912494]  [<ffffffff82162f4d>] ? ohci_hcd_mod_init+0x54/0x54
[    1.912494]  [<ffffffff82162fcd>] uhci_hcd_init+0x80/0xc3
[    1.912494]  [<ffffffff82162f4d>] ? ohci_hcd_mod_init+0x54/0x54
[    1.912494]  [<ffffffff810002a2>] do_one_initcall+0x122/0x170
[    1.912494]  [<ffffffff8212cd03>] kernel_init+0x139/0x1bd
[    1.912494]  [<ffffffff8212c5af>] ? do_early_param+0x8c/0x8c
[    1.912494]  [<ffffffff81b4f5f4>] kernel_thread_helper+0x4/0x10
[    1.912494]  [<ffffffff81b4db19>] ? retint_restore_args+0xe/0xe
[    1.912494]  [<ffffffff8212cbca>] ? start_kernel+0x3bc/0x3bc
[    1.912494]  [<ffffffff81b4f5f0>] ? gs_change+0xb/0xb
[    3.201635] ata4.00: ATA-6: ST3200822AS, 3.01, max UDMA/133
[    3.209975] ata4.00: 390721968 sectors, multi 16: LBA48
[    3.218619] uhci_hcd 0000:00:10.1: UHCI Host Controller
[    3.227150] ata2: SATA link down (SStatus 0 SControl 0)



Thanks.

Best regards
Adko.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/