lockdep and preemptoff tracer are fighting again.

From: Steven Rostedt
Date: Thu Jan 22 2009 - 15:40:35 EST




Hey guys, I can consistently hit this bug when running the preempt tracer:

------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2899 check_flags+0x154/0x18b()
Hardware name: Precision WorkStation 470
Modules linked in: radeon drm autofs4 hidp rfcomm l2cap bluetooth sunrpc nf_conn
track_netbios_ns ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state iptable_fi
lter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 sb
s sbshc battery ac snd_intel8x0 snd_ac97_codec sg ac97_bus snd_seq_dummy snd_seq
_oss snd_seq_midi_event floppy snd_seq snd_seq_device snd_pcm_oss ide_cd_mod snd
_mixer_oss cdrom snd_pcm e1000 serio_raw snd_timer snd i2c_i801 button soundcore
ata_generic i2c_core iTCO_wdt snd_page_alloc e752x_edac iTCO_vendor_support shp
chp edac_core pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod
ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 3855, comm: sshd Not tainted 2.6.29-rc2-tip #366
Call Trace:
[<ffffffff80245e9f>] warn_slowpath+0xd8/0xf7
[<ffffffff80297154>] ? ring_buffer_unlock_commit+0x24/0xa3
[<ffffffff80299501>] ? trace_function+0xad/0xbc
[<ffffffff8025c1ff>] ? remove_wait_queue+0x4d/0x52
[<ffffffff8029e5dc>] ? trace_preempt_on+0x113/0x130
[<ffffffff8029e4ba>] ? check_critical_timing+0x12e/0x13d
[<ffffffff8025c1ff>] ? remove_wait_queue+0x4d/0x52
[<ffffffff8029f75b>] ? stack_trace_call+0x249/0x25d
[<ffffffff802da06e>] ? fput+0x4/0x1c
[<ffffffff802e7edc>] ? free_poll_entry+0x26/0x2a
[<ffffffff802da06e>] ? fput+0x4/0x1c
[<ffffffff8020c2d6>] ? ftrace_call+0x5/0x2b
[<ffffffff8029f75b>] ? stack_trace_call+0x249/0x25d
[<ffffffff80543dec>] ? _spin_lock_irqsave+0xb/0x59
[<ffffffff802699bf>] check_flags+0x154/0x18b
[<ffffffff8026de66>] lock_acquire+0x41/0xa9
[<ffffffff80543dfd>] ? _spin_lock_irqsave+0x1c/0x59
[<ffffffff80543e27>] _spin_lock_irqsave+0x46/0x59
[<ffffffff8029519c>] ? ring_buffer_reset_cpu+0x31/0x6b
[<ffffffff8029519c>] ring_buffer_reset_cpu+0x31/0x6b
[<ffffffff80299ec6>] tracing_reset+0x46/0x9b
[<ffffffff8029e33f>] trace_preempt_off+0x100/0x14d
[<ffffffff8024b491>] ? local_bh_disable+0x12/0x14
[<ffffffff8024b44f>] ? __local_bh_disable+0xc0/0xf0
[<ffffffff8024b491>] ? local_bh_disable+0x12/0x14
[<ffffffff80543b95>] ? _spin_lock_bh+0x16/0x4c
[<ffffffff80546df1>] add_preempt_count+0x12d/0x132
[<ffffffff8024b44f>] __local_bh_disable+0xc0/0xf0
[<ffffffff8024b491>] local_bh_disable+0x12/0x14
[<ffffffff80543b95>] _spin_lock_bh+0x16/0x4c
[<ffffffff804ab49a>] lock_sock_nested+0x28/0xe5
[<ffffffff80292c90>] ? ftrace_list_func+0x24/0x39
[<ffffffff8020c2d6>] ? ftrace_call+0x5/0x2b
[<ffffffff804eff87>] tcp_sendmsg+0x27/0xac2
[<ffffffff803556c7>] ? cap_socket_sendmsg+0x4/0xd
[<ffffffff80292c90>] ? ftrace_list_func+0x24/0x39
[<ffffffff8020c2d6>] ? ftrace_call+0x5/0x2b
[<ffffffff804a82b0>] sock_aio_write+0x109/0x11d
[<ffffffff8029f75b>] ? stack_trace_call+0x249/0x25d
[<ffffffff8020c2d6>] ? ftrace_call+0x5/0x2b
[<ffffffff802d8881>] do_sync_write+0xf0/0x137
[<ffffffff8025c002>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff8020c2d6>] ? ftrace_call+0x5/0x2b
[<ffffffff803553ca>] ? cap_file_permission+0x9/0xd
[<ffffffff80353c88>] ? security_file_permission+0x16/0x18
[<ffffffff802d921c>] vfs_write+0x103/0x17d
[<ffffffff802d978f>] sys_write+0x4e/0x8c
[<ffffffff8020c64b>] system_call_fastpath+0x16/0x1b
---[ end trace 713cc9df66b54d6e ]---


The cause is simple. The following happens:

local_bh_disable is called, which calls __local_bh_disable which does a
add_preempt_count(SOFTIRQ_OFFSET).

Thus, add_preempt_count adds the SOFTIRQ_OFFSET to the preempt_count of
current, and then calls trace_preempt_off.

This goes into the preempt tracer which calls start_critical_timing, and
this will reset the ring buffer for the CPU, because this is the start of
the trace.

ring_buffer_reset_cpu() calls spin_lock_irqsave() which eventually calls
spin_acquire which is lock_acquire in lockdep.

lock_acquire calls check_flags which performs this check:

if (!hardirq_count()) {
if (softirq_count())
DEBUG_LOCKS_WARN_ON(current->softirqs_enabled);
else
DEBUG_LOCKS_WARN_ON(!current->softirqs_enabled);
}

With this:

#define hardirq_count() (preempt_count() & HARDIRQ_MASK)
#define softirq_count() (preempt_count() & SOFTIRQ_MASK)


The hardirq_count returns false, but the softirq_count returns true and
softirqs_enalbed is also true. The problem lies in local_bh_disable:

static void __local_bh_disable(unsigned long ip)
{
unsigned long flags;

WARN_ON_ONCE(in_irq());

raw_local_irq_save(flags);
add_preempt_count(SOFTIRQ_OFFSET); <-- here softirq_count is true
/*
* Were softirqs turned off above:
*/
if (softirq_count() == SOFTIRQ_OFFSET)
trace_softirqs_off(ip); <-- here softirqs_enabled is false
raw_local_irq_restore(flags);
}

If we call into lockdep between softirq_count == true and
softirqs_enabled == false, we hit the WARN_ON.

The trace_softirqs_off() sets softirs_enabled to false. But because the
tracer calls into lockdep between the two, we hit this warning.

If we try to swap the trace_softirqs_off with the add_preempt_count we hit
another warning thatch checks to make sure softirq_count is true in the
trace_softirqs_off code.

We need a way to have lockdep and the preempt tracer to be able to talk to
each other and let it know that it should not fail here.

Any ideas?

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/