Re: skd: disable discard support

From: Mike Snitzer
Date: Wed Feb 12 2014 - 17:22:54 EST


On Wed, Feb 12 2014 at 5:19pm -0500,
Mike Snitzer <snitzer@xxxxxxxxxx> wrote:

> On Wed, Feb 12 2014 at 5:18pm -0500,
> Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
>
> > The skd driver has never handled discards reliably.
> >
> > The kernel will BUG as a result of issuing discards to the skd device.
> > Disable the skd driver's discard support until it is proven reliable.
>
> Here is the first BUG I recently saw:

And a 2nd:

Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 10
CPU: 10 PID: 0 Comm: swapper/10 Tainted: G W O 3.14.0-rc1.snitm+ #4
Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011
0000000000000000 ffff88033fd47bb8 ffffffff8153f180 000000000000fffa
ffffffff817d8778 ffff88033fd47c38 ffffffff8153ef0d 0000000000000010
ffff88033fd47c48 ffff88033fd47be8 0000000000000000 0000000000000000
Call Trace:
<NMI> [<ffffffff8153f180>] dump_stack+0x49/0x61
[<ffffffff8153ef0d>] panic+0xbb/0x1d5
[<ffffffff810e8761>] watchdog_overflow_callback+0xb1/0xc0
[<ffffffff8111e9b8>] __perf_event_overflow+0x98/0x220
[<ffffffff8111f2a4>] perf_event_overflow+0x14/0x20
[<ffffffff8102012e>] intel_pmu_handle_irq+0x1de/0x3c0
[<ffffffff8115f931>] ? unmap_kernel_range_noflush+0x11/0x20
[<ffffffff8131a5c5>] ? ghes_copy_tofrom_phys+0xe5/0x200
[<ffffffff81544e84>] perf_event_nmi_handler+0x34/0x60
[<ffffffff8154464a>] nmi_handle+0x8a/0x170
[<ffffffff81544848>] default_do_nmi+0x68/0x210
[<ffffffff81544a80>] do_nmi+0x90/0xe0
[<ffffffff81543ca7>] end_repeat_nmi+0x1e/0x2e
[<ffffffffa06ef7a0>] ? skd_timer_tick_not_online+0x330/0x330 [skd]
[<ffffffff815432a1>] ? _raw_spin_lock_irqsave+0x21/0x30
[<ffffffff815432a1>] ? _raw_spin_lock_irqsave+0x21/0x30
[<ffffffff815432a1>] ? _raw_spin_lock_irqsave+0x21/0x30
<<EOE>> <IRQ> [<ffffffffa06ef7d9>] skd_timer_tick+0x39/0x1e0 [skd]
[<ffffffff81069480>] ? __queue_work+0x360/0x360
[<ffffffffa06ef7a0>] ? skd_timer_tick_not_online+0x330/0x330 [skd]
[<ffffffff8105a318>] call_timer_fn+0x48/0x120
[<ffffffff8105aef5>] run_timer_softirq+0x225/0x290
[<ffffffffa06ef7a0>] ? skd_timer_tick_not_online+0x330/0x330 [skd]
[<ffffffff8105365c>] __do_softirq+0xfc/0x2b0
[<ffffffff810bc09f>] ? tick_do_update_jiffies64+0x9f/0xd0
[<ffffffff8105390d>] irq_exit+0xbd/0xd0
[<ffffffff8154dbea>] smp_apic_timer_interrupt+0x4a/0x5a
[<ffffffff8154c8ca>] apic_timer_interrupt+0x6a/0x70
<EOI> [<ffffffff8144d710>] ? cpuidle_enter_state+0xa0/0xd0
[<ffffffff8144d6cb>] ? cpuidle_enter_state+0x5b/0xd0
[<ffffffff8144d887>] cpuidle_idle_call+0xc7/0x160
[<ffffffff8100cf5e>] arch_cpu_idle+0xe/0x30
[<ffffffff810a696a>] cpu_idle_loop+0x9a/0x240
[<ffffffff810b9e64>] ? clockevents_register_device+0xc4/0x130
[<ffffffff810a6b33>] cpu_startup_entry+0x23/0x30
[<ffffffff81032d5a>] start_secondary+0x7a/0x80
Shutting down cpus with NMI
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
------------[ cut here ]------------
WARNING: CPU: 10 PID: 0 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5f/0x70()
Modules linked in: skd(O) dm_thin_pool(O) dm_bio_prison(O) dm_persistent_data(O) dm_bufio(O) libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe 8021q libfc garp stp scsi_transport_fc llc scsi_tgt sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net macvtap macvlan vhost tun kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses enclosure sg acpi_cpufreq ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix sd_mod crc_t10dif crct10dif_common dm_mirror dm_region_hash dm_log dm_mod megaraid_sas [last unloaded: skd]
CPU: 10 PID: 0 Comm: swapper/10 Tainted: G W O 3.14.0-rc1.snitm+ #4
Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011
000000000000007c ffff88033fd478c0 ffffffff8153f180 000000000000007c
0000000000000000 ffff88033fd47900 ffffffff8104e9bc ffff88033fd52c40
ffff88033fc52c40 0000000000000002 ffff88033fd52c40 ffff8803329be250
Call Trace:
<NMI> [<ffffffff8153f180>] dump_stack+0x49/0x61
[<ffffffff8104e9bc>] warn_slowpath_common+0x8c/0xc0
[<ffffffff8104ea0a>] warn_slowpath_null+0x1a/0x20
[<ffffffff8103141f>] native_smp_send_reschedule+0x5f/0x70
[<ffffffff81087e3e>] trigger_load_balance+0x15e/0x200
[<ffffffff8107ccf7>] scheduler_tick+0xa7/0xe0
[<ffffffff8105a031>] update_process_times+0x61/0x80
[<ffffffff8131863c>] ? apei_exec_write_register_value+0x1c/0x20
[<ffffffff810bbfb9>] tick_sched_handle+0x39/0x80
[<ffffffff810bc1e4>] tick_sched_timer+0x54/0x90
[<ffffffff810743be>] __run_hrtimer+0x7e/0x1c0
[<ffffffff810bc190>] ? tick_nohz_handler+0xc0/0xc0
[<ffffffff810747ae>] hrtimer_interrupt+0x10e/0x260
[<ffffffff8103489b>] local_apic_timer_interrupt+0x3b/0x60
[<ffffffff8154dbe5>] smp_apic_timer_interrupt+0x45/0x5a
[<ffffffff8154c8ca>] apic_timer_interrupt+0x6a/0x70
[<ffffffff8153efe4>] ? panic+0x192/0x1d5
[<ffffffff8153ef42>] ? panic+0xf0/0x1d5
[<ffffffff810e8761>] watchdog_overflow_callback+0xb1/0xc0
[<ffffffff8111e9b8>] __perf_event_overflow+0x98/0x220
[<ffffffff8111f2a4>] perf_event_overflow+0x14/0x20
[<ffffffff8102012e>] intel_pmu_handle_irq+0x1de/0x3c0
[<ffffffff8115f931>] ? unmap_kernel_range_noflush+0x11/0x20
[<ffffffff8131a5c5>] ? ghes_copy_tofrom_phys+0xe5/0x200
[<ffffffff81544e84>] perf_event_nmi_handler+0x34/0x60
[<ffffffff8154464a>] nmi_handle+0x8a/0x170
[<ffffffff81544848>] default_do_nmi+0x68/0x210
[<ffffffff81544a80>] do_nmi+0x90/0xe0
[<ffffffff81543ca7>] end_repeat_nmi+0x1e/0x2e
[<ffffffffa06ef7a0>] ? skd_timer_tick_not_online+0x330/0x330 [skd]
[<ffffffff815432a1>] ? _raw_spin_lock_irqsave+0x21/0x30
[<ffffffff815432a1>] ? _raw_spin_lock_irqsave+0x21/0x30
[<ffffffff815432a1>] ? _raw_spin_lock_irqsave+0x21/0x30
<<EOE>> <IRQ> [<ffffffffa06ef7d9>] skd_timer_tick+0x39/0x1e0 [skd]
[<ffffffff81069480>] ? __queue_work+0x360/0x360
[<ffffffffa06ef7a0>] ? skd_timer_tick_not_online+0x330/0x330 [skd]
[<ffffffff8105a318>] call_timer_fn+0x48/0x120
[<ffffffff8105aef5>] run_timer_softirq+0x225/0x290
[<ffffffffa06ef7a0>] ? skd_timer_tick_not_online+0x330/0x330 [skd]
[<ffffffff8105365c>] __do_softirq+0xfc/0x2b0
[<ffffffff810bc09f>] ? tick_do_update_jiffies64+0x9f/0xd0
[<ffffffff8105390d>] irq_exit+0xbd/0xd0
[<ffffffff8154dbea>] smp_apic_timer_interrupt+0x4a/0x5a
[<ffffffff8154c8ca>] apic_timer_interrupt+0x6a/0x70
<EOI> [<ffffffff8144d710>] ? cpuidle_enter_state+0xa0/0xd0
[<ffffffff8144d6cb>] ? cpuidle_enter_state+0x5b/0xd0
[<ffffffff8144d887>] cpuidle_idle_call+0xc7/0x160
[<ffffffff8100cf5e>] arch_cpu_idle+0xe/0x30
[<ffffffff810a696a>] cpu_idle_loop+0x9a/0x240
[<ffffffff810b9e64>] ? clockevents_register_device+0xc4/0x130
[<ffffffff810a6b33>] cpu_startup_entry+0x23/0x30
[<ffffffff81032d5a>] start_secondary+0x7a/0x80
---[ end trace 72a22a0dddd989d3 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/