Re: irq_fpu_usable() is irreliable

From: Ingo Molnar
Date: Fri Nov 27 2015 - 03:47:10 EST



* Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:

> Intel 3820QM, but inside VMWare Workstation 12.
>
> > Third, could you post such a problematic stack trace?
>
> Sure: https://paste.kde.org/pfhhdchs9/7mmtvb

So it's:

[ 187.194226] CPU: 0 PID: 1165 Comm: iperf3 Tainted: G O 4.2.3-1-ARCH #1
[ 187.194229] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 187.194231] 0000000000000000 0000000062ca03ad ffff88003b82f0d0 ffffffff8156c0ca
[ 187.194233] ffff88003bfa0dc0 0000000000000090 ffff88003b82f260 ffffffffa03fc27e
[ 187.194234] 0000000000000010 ffff88003be05300 0000000000000000 ffff88003b82f3e0
[ 187.194235] Call Trace:
[ 187.194244] [<ffffffff8156c0ca>] dump_stack+0x4c/0x6e
[ 187.194248] [<ffffffffa03fc27e>] chacha20_avx+0x23e/0x250 [wireguard]
[ 187.194253] [<ffffffff8101de03>] ? nommu_map_page+0x43/0x80
[ 187.194257] [<ffffffffa0344161>] ? e1000_xmit_frame+0xdf1/0x11c0 [e1000]
[ 187.194259] [<ffffffffa03fbe6e>] ? poly1305_update_asm+0x11e/0x1b0 [wireguard]
[ 187.194260] [<ffffffffa03fcd0d>] chacha20_finish+0x3d/0x60 [wireguard]
[ 187.194262] [<ffffffffa03f8eae>] chacha20poly1305_encrypt_finish+0x2e/0xf0 [wireguard]
[ 187.194263] [<ffffffffa03efa32>] noise_message_encrypt+0x162/0x180 [wireguard]
[ 187.194269] [<ffffffff811b60e5>] ? __kmalloc_node_track_caller+0x35/0x2e0
[ 187.194274] [<ffffffff81460af7>] ? __alloc_skb+0x87/0x210
[ 187.194275] [<ffffffff81460a11>] ? __kmalloc_reserve.isra.5+0x31/0x90
[ 187.194276] [<ffffffff81460acb>] ? __alloc_skb+0x5b/0x210
[ 187.194278] [<ffffffff81460b0b>] ? __alloc_skb+0x9b/0x210
[ 187.194279] [<ffffffffa03f2a65>] noise_message_create_data+0x55/0x80 [wireguard]
[ 187.194280] [<ffffffffa03e9708>] packet_send_queue+0x1f8/0x4d0 [wireguard]
[ 187.194285] [<ffffffff810a8219>] ? dequeue_entity+0x149/0x690
[ 187.194287] [<ffffffff810a9051>] ? put_prev_entity+0x31/0x420
[ 187.194289] [<ffffffff810146ec>] ? __switch_to+0x25c/0x4a0
[ 187.194291] [<ffffffff81099ce2>] ? finish_task_switch+0x62/0x1b0
[ 187.194292] [<ffffffff8156d500>] ? __schedule+0x340/0xa00
[ 187.194296] [<ffffffff810ddf19>] ? hrtimer_try_to_cancel+0x29/0x120
[ 187.194298] [<ffffffff810b4464>] ? add_wait_queue+0x44/0x50
[ 187.194299] [<ffffffff811b60e5>] ? __kmalloc_node_track_caller+0x35/0x2e0
[ 187.194302] [<ffffffff811e33ce>] ? __pollwait+0x7e/0xe0
[ 187.194303] [<ffffffff81460af7>] ? __alloc_skb+0x87/0x210
[ 187.194304] [<ffffffff81460a11>] ? __kmalloc_reserve.isra.5+0x31/0x90
[ 187.194305] [<ffffffffa03e861f>] xmit+0x8f/0xe0 [wireguard]
[ 187.194308] [<ffffffff8147588f>] dev_hard_start_xmit+0x24f/0x3f0
[ 187.194309] [<ffffffff814753be>] ? validate_xmit_skb.isra.34.part.35+0x1e/0x2a0
[ 187.194310] [<ffffffff81476042>] __dev_queue_xmit+0x4d2/0x540
[ 187.194311] [<ffffffff814760c3>] dev_queue_xmit_sk+0x13/0x20
[ 187.194313] [<ffffffff8147d9c2>] neigh_direct_output+0x12/0x20
[ 187.194315] [<ffffffff814b1756>] ip_finish_output2+0x1b6/0x3c0
[ 187.194317] [<ffffffff814b309e>] ? __ip_append_data.isra.3+0x6ae/0xac0
[ 187.194317] [<ffffffff814b376c>] ip_finish_output+0x13c/0x1d0
[ 187.194318] [<ffffffff814b3b75>] ip_output+0x75/0xe0
[ 187.194319] [<ffffffff814b468d>] ? ip_make_skb+0x10d/0x130
[ 187.194320] [<ffffffff814b1381>] ip_local_out_sk+0x31/0x40
[ 187.194321] [<ffffffff814b44ea>] ip_send_skb+0x1a/0x50
[ 187.194323] [<ffffffff814dc221>] udp_send_skb+0x151/0x280
[ 187.194325] [<ffffffff814dd7f5>] udp_sendmsg+0x305/0x9d0
[ 187.194327] [<ffffffff8157115e>] ? _raw_spin_unlock_bh+0xe/0x10
[ 187.194328] [<ffffffff814e8daf>] inet_sendmsg+0x7f/0xb0
[ 187.194329] [<ffffffff81457227>] sock_sendmsg+0x17/0x30
[ 187.194330] [<ffffffff814572c5>] sock_write_iter+0x85/0xf0
[ 187.194332] [<ffffffff811d028c>] __vfs_write+0xcc/0x100
[ 187.194333] [<ffffffff811d0b04>] vfs_write+0xa4/0x1a0
[ 187.194334] [<ffffffff811d1815>] SyS_write+0x55/0xc0
[ 187.194335] [<ffffffff8157162e>] entry_SYSCALL_64_fastpath+0x12/0x71

so this does not seem to be a very complex stack trace: we are trying to use the
FPU from a regular process, from a regular system call path. No interrupts, no
kernel threads, no complications.

We possibly context switched recently:

[ 187.194285] [<ffffffff810a8219>] ? dequeue_entity+0x149/0x690
[ 187.194287] [<ffffffff810a9051>] ? put_prev_entity+0x31/0x420
[ 187.194289] [<ffffffff810146ec>] ? __switch_to+0x25c/0x4a0
[ 187.194291] [<ffffffff81099ce2>] ? finish_task_switch+0x62/0x1b0
[ 187.194292] [<ffffffff8156d500>] ? __schedule+0x340/0xa00

but that's all that I can see in the trace.

So as a first step I'd try Linus's very latest kernel, to make sure it's not a bug
that got fixed meanwhile. If it still occurs, try to report it to the vmware
virtualization folks. Maybe it's some host kernel activity that changes the state
of the FPU. I don't know ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/