Re: [git pull] vfs pile 1 (splice)

From: Linus Torvalds
Date: Sun Oct 09 2016 - 02:06:06 EST


On Fri, Oct 7, 2016 at 3:20 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> splice stuff.

Hmm. I've now gotten two oopses today, all at __kmalloc+0xc3/0x1f0,
which seems to be the

*(void **)(object + s->offset);

in get_freepointer(). Because it started happening today, I'm inclined
to blame mainly stuff I merged late yesterday.

I'm pretty sure that 4.8.0-09134-g4c1fad64eff4 is all good, in
particular, while the problems definitely happen with
4.8.0-11288-gb66484cd7470.

Much of the stuff yesterday was non-x86 archiectures (the ARM soc
stuff, avr32,parisc and power), so the main suspects are

- Andrew's series
- Al's splice stuff
- Ted's ext4 changes
- Jens' block layer changes

yes, there are other things that came in between there, not just the
architecture things, but they seem much less likely to trigger for me.

The traces don't really give me any real ideas, they look like this:

BUG: unable to handle kernel paging request at ffff9db749d0c000
IP: [<ffffffffb320cbe3>] __kmalloc+0xc3/0x1f0
PGD 426098067
PUD 426099067
PMD 344b1a067
PTE 0

Oops: 0000 [#1] SMP
Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter
xt_conntrack ebtable_nat ebtable_broute bridge st
acpi_als pinctrl_sunrisepoint tpm_tis pinctrl_intel kfifo_buf
tpm_tis_core tpm industrialio acpi_pad nfsd auth_rpcgss nfs_acl lockd
grace sunrpc dm_crypt i915 i2c_algo_bit drm_kms_helper
crct10dif_pclmul crc32_pclm
CPU: 0 PID: 3091 Comm: collect2 Tainted: G O
4.8.0-11288-gb66484cd7470-dirty #4
Hardware name: System manufacturer System Product Name/Z170-K, BIOS
1803 05/06/2016
task: ffff8ee43dbad940 task.stack: ffff9db749ee4000
RIP: 0010:[<ffffffffb320cbe3>] [<ffffffffb320cbe3>] __kmalloc+0xc3/0x1f0
RSP: 0018:ffff9db749ee7b80 EFLAGS: 00010246
RAX: ffff9db749d0c000 RBX: 00000000024000c0 RCX: 0000000000000000
RDX: 00000000000034f7 RSI: 0000000000000000 RDI: 000000000001b620
RBP: ffff9db749ee7bb0 R08: ffff8ee4b6c1b620 R09: ffff8ee475810b3f
R10: ffff9db749d0c000 R11: ffff8ee488a16240 R12: 00000000024000c0
R13: 0000000000000044 R14: ffff8ee4a60037c0 R15: ffff8ee4a60037c0
FS: 00007f3f8b10f740(0000) GS:ffff8ee4b6c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff9db749d0c000 CR3: 00000003a188a000 CR4: 00000000003406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
ffffffffb3258f68 ffff9db749ee7c40 0000000000000024 ffff8ee3f5810b40
7974697275636573 ffffffffb3c99760 ffff9db749ee7bd8 ffffffffb3258f68
ffff9db749ee7c40 ffff8ee4a1cb7378 ffff8ee4a1cb7360 ffff9db749ee7c10
Call Trace:
[<ffffffffb3258f68>] ? simple_xattr_alloc+0x28/0x60
[<ffffffffb3258f68>] simple_xattr_alloc+0x28/0x60
[<ffffffffb31bec60>] shmem_initxattrs+0x90/0xd0
[<ffffffffb333e60a>] security_inode_init_security+0x11a/0x160
[<ffffffffb31bebd0>] ? shmem_fh_to_dentry+0x60/0x60
[<ffffffffb31c00e2>] shmem_mknod+0x62/0xd0
[<ffffffffb31c0418>] shmem_create+0x18/0x20
[<ffffffffb324110a>] path_openat+0x128a/0x13c0
[<ffffffffb3242541>] do_filp_open+0x91/0x100
[<ffffffffb325051f>] ? __alloc_fd+0x3f/0x170
[<ffffffffb322fe10>] do_sys_open+0x130/0x220
[<ffffffffb322ff1e>] SyS_open+0x1e/0x20
[<ffffffffb379df20>] entry_SYSCALL_64_fastpath+0x13/0x94
Code: 49 83 78 10 00 4d 8b 10 0f 84 ce 00 00 00 4d 85 d2 0f 84 c5 00
00 00 49 63 47 20 49 8b 3f 4c 01 d0 40 f6 c7 0f 0f 85 1a 01 00 00 <48>
8b 18 48 8d 4a 01 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74
RIP [<ffffffffb320cbe3>] __kmalloc+0xc3/0x1f0

and

general protection fault: 0000 [#1] SMP
Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_nat ebtable_broute bridge st
acpi_als pinctrl_sunrisepoint kfifo_buf pinctrl_intel industrialio
tpm_tis tpm_tis_core tpm acpi_pad nfsd auth_rpcgss nfs_acl lockd grace
sunrpc dm_crypt i915 crct10dif_pclmul crc32_pclmul crc32c_intel
i2c_algo_bit
CPU: 5 PID: 3649 Comm: make Not tainted 4.8.0-11290-g13510890a847-dirty #3
Hardware name: System manufacturer System Product Name/Z170-K, BIOS
1803 05/06/2016
task: ffff8e3738188000 task.stack: ffffabe649e88000
RIP: 0010:[<ffffffff8720cd63>] [<ffffffff8720cd63>] __kmalloc+0xc3/0x1f0
RSP: 0018:ffffabe649e8bc38 EFLAGS: 00010246
RAX: 1e7acd36f90e784c RBX: 00000000024080c0 RCX: ffff8e36e78631f4
RDX: 000000000000284a RSI: 0000000000000000 RDI: 000000000001b620
RBP: ffffabe649e8bc68 R08: ffff8e3776d5b620 R09: 0000000084200088
R10: 1e7acd36f90e784c R11: 0000000069636574 R12: 00000000024080c0
R13: 000000000000004b R14: ffff8e37660037c0 R15: ffff8e37660037c0
FS: 00007f020d92a740(0000) GS:ffff8e3776d40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffed15ee080 CR3: 00000003f815a000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
ffffffff872bbb8e ffff8e36faebc818 ffff8e3717123100 00000000b0b3acc0
000000000ce85fb7 ffff8e36e78631f4 ffffabe649e8bca8 ffffffff872bbb8e
ffffabe649e8bcf0 ffff8e36faebc818 ffffabe649e8bd80 ffff8e36e78631fc
Call Trace:
[<ffffffff872bbb8e>] ? ext4_htree_store_dirent+0x3e/0x120
[<ffffffff872bbb8e>] ext4_htree_store_dirent+0x3e/0x120
[<ffffffff872cd427>] htree_dirblock_to_tree+0xc7/0x1c0
[<ffffffff872ce572>] ext4_htree_fill_tree+0xb2/0x320
[<ffffffff871e0da1>] ? special_mapping_fault+0x31/0xa0
[<ffffffff872bb900>] ext4_readdir+0x660/0x890
[<ffffffff8734620d>] ? __inode_security_revalidate+0x4d/0x70
[<ffffffff87245f22>] iterate_dir+0x172/0x1a0
[<ffffffff87246398>] SyS_getdents+0x98/0x120
[<ffffffff87246120>] ? fillonedir+0xc0/0xc0
[<ffffffff8779e0a0>] entry_SYSCALL_64_fastpath+0x13/0x94
Code: 49 83 78 10 00 4d 8b 10 0f 84 ce 00 00 00 4d 85 d2 0f 84 c5 00
00 00 49 63 47 20 49 8b 3f 4c 01 d0 40 f6 c7 0f 0f 85 1a 01 00 00 <48>
8b 18 48 8d 4a 01 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74
RIP [<ffffffff8720cd63>] __kmalloc+0xc3/0x1f0
RSP <ffffabe649e8bc38>
---[ end trace 843edceadb3bd424 ]---

so in both cases it was filesystem stuff, but I'm not sure how much of
a pattern that is.

The trapping instruction is just a

mov (%rax),%rbx

and as you can see rax is garbage.

I guess I'll need to just run with slab debugging on, but I wanted to
bring this to peoples attention in case it rings a bell for somebody.
I haven't been merging anything today, partly because of this.

The problem *may* go back further, but I did run 4c1fad64eff4 for a
while without any sign of this.

Linus