Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220

From: Eryu Guan
Date: Mon Oct 30 2017 - 03:44:40 EST


Hi Fengguang,

On Mon, Oct 30, 2017 at 08:20:21AM +0100, Fengguang Wu wrote:
> CC fsdevel.
>
> On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote:
> > Hi Linus,
> >
> > Up to now we see the below boot error/warnings when testing v4.14-rc6.
> >
> > They hit the RC release mainly due to various imperfections in 0day's
> > auto bisection. So I manually list them here and CC the likely easy to
> > debug ones to the corresponding maintainers in the followup emails.
> >
> > boot_successes: 4700
> > boot_failures: 247
>
> [...]
>
> > WARNING:at_fs/direct-io.c:#dio_complete: 7
> > WARNING:at_fs/iomap.c:#iomap_dio_complete: 3
> > WARNING:at_fs/iomap.c:#iomap_dio_rw: 1
>
> The first warning happens on btrfs and is bisected to this commit.
> The other 2 warnings happen on xfs.

I noticed that the warnings are triggered by generic/095 and
generic/208, they're known to generate such warnings and I think these
warnings are kind of 'known issue', please see comments above
_filter_aiodio_dmesg() in fstests/common/filter.

Please make sure your local fstests contains the following 3 commits:

ca93e26865ab common: move _filter_xfs_dmesg() to common/filter
5aa662733ab0 common: turn _filter_xfs_dmesg() into _filter_aiodio_dmesg()
228aee780f13 generic/036,208: whitelist [iomap_]dio_complete() WARNs

we filtered out such warnings in fstests on purpose so the affected
tests won't fail because of such dmesg warnings.

Thanks,
Eryu

>
> commit 332391a9935da939319e473b4680e173df75afcf
> Author: Lukas Czerner <lczerner@xxxxxxxxxx>
> AuthorDate: Thu Sep 21 08:16:29 2017 -0600
> Commit: Jens Axboe <axboe@xxxxxxxxx>
> CommitDate: Mon Sep 25 08:56:05 2017 -0600
>
> fs: Fix page cache inconsistency when mixing buffered and AIO DIO
>
> Currently when mixing buffered reads and asynchronous direct writes it
> is possible to end up with the situation where we have stale data in the
> page cache while the new data is already written to disk. This is
> permanent until the affected pages are flushed away. Despite the fact
> that mixing buffered and direct IO is ill-advised it does pose a thread
> for a data integrity, is unexpected and should be fixed.
>
> Fix this by deferring completion of asynchronous direct writes to a
> process context in the case that there are mapped pages to be found in
> the inode. Later before the completion in dio_complete() invalidate
> the pages in question. This ensures that after the completion the pages
> in the written area are either unmapped, or populated with up-to-date
> data. Also do the same for the iomap case which uses
> iomap_dio_complete() instead.
>
> This has a side effect of deferring the completion to a process context
> for every AIO DIO that happens on inode that has pages mapped. However
> since the consensus is that this is ill-advised practice the performance
> implication should not be a problem.
>
> This was based on proposal from Jeff Moyer, thanks!
>
> Reviewed-by: Jan Kara <jack@xxxxxxx>
> Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> Reviewed-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
> Signed-off-by: Lukas Czerner <lczerner@xxxxxxxxxx>
> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> ---
> fs/direct-io.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
> fs/iomap.c | 29 ++++++++++++++++-------------
> mm/filemap.c | 6 ++++++
> 3 files changed, 64 insertions(+), 18 deletions(-)
>
> The call traces are:
>
> [ 334.461955] BTRFS info (device vdb): has skinny extents
> [ 334.463231] BTRFS info (device vdb): flagging fs with big metadata feature
> [ 334.469746] BTRFS info (device vdb): creating UUID tree
> [ 336.190840] ------------[ cut here ]------------
> [ 336.193338] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220
> [ 336.196925] Modules linked in: btrfs xor zstd_decompress zstd_compress xxhash raid6_pq dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver sr_mod cdrom sg ata_generic pata_acpi ppdev snd_pcm snd_timer snd soundcore serio_raw parport_pc pcspkr parport floppy ata_piix libata i2c_piix4 ip_tables
> [ 336.203746] CPU: 0 PID: 6379 Comm: kworker/0:0 Not tainted 4.14.0-rc6 #1
> [ 336.204799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [ 336.207480] Workqueue: dio/vdb dio_aio_complete_work
> [ 336.208373] task: ffff880079094a80 task.stack: ffffc90000950000
> [ 336.210495] RIP: 0010:dio_complete+0x1d4/0x220
> [ 336.211288] RSP: 0018:ffffc90000953e20 EFLAGS: 00010286
> [ 336.213145] RAX: 00000000fffffff0 RBX: ffff8800aae8c780 RCX: ffffc90000953c70
> [ 336.214232] RDX: 0000000080000000 RSI: 00000000000002b4 RDI: ffffffff81cb9ccb
> [ 336.216605] RBP: ffffc90000953e48 R08: 0000000000000000 R09: ffff880079c8b6c0
> [ 336.217726] R10: 0000000000000074 R11: 0000000000000000 R12: 0000000000002000
> [ 336.220141] R13: 0000000000002000 R14: 0000000000000003 R15: 0000000000074000
> [ 336.221232] FS: 0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
> [ 336.223915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 336.224885] CR2: 000055e9eae6b750 CR3: 00000000752c7000 CR4: 00000000000006f0
> [ 336.227202] Call Trace:
> [ 336.227750] dio_aio_complete_work+0x19/0x20
> [ 336.228536] process_one_work+0x198/0x3e0
> [ 336.230396] worker_thread+0x4e/0x3e0
> [ 336.231064] kthread+0x114/0x150
> [ 336.232489] ? process_one_work+0x3e0/0x3e0
> [ 336.233250] ? kthread_create_on_node+0x40/0x40
> [ 336.234929] ? kthread_create_on_node+0x40/0x40
> [ 336.235754] ret_from_fork+0x25/0x30
> [ 336.237322] Code: 8b 78 30 48 83 7f 58 00 0f 84 e9 fe ff ff 4b 8d 54 27 ff 4c 89 fe 48 c1 fe 0c 48 c1 fa 0c e8 64 58 f3 ff 85 c0 0f 84 cc fe ff ff <0f> ff e9 c5 fe ff ff 8b 50 20 f6 c2 10 0f 84 e5 fe ff ff 48 8b
> [ 336.240031] ---[ end trace 3d29f0720cd16ad3 ]---
> [ 378.043462] generic/095 [failed, exit status 1] - output mismatch (see /lkp/benchmarks/xfstests/results//generic/095.out.bad)
> [ 378.043465]
>
>
> [ 121.160532] run fstests generic/117 at 2017-10-25 05:27:29
> [ 123.275347] XFS (vdd): Mounting V5 Filesystem
> [ 123.280579] XFS (vdd): Ending clean mount
> [ 163.043220] XFS (vdd): Unmounting Filesystem
> [ 163.258872] generic/117 42s
> [ 163.258875]
> [ 164.009187] run fstests generic/208 at 2017-10-25 05:28:12
> [ 164.344046] ------------[ cut here ]------------
> [ 164.345164] WARNING: CPU: 1 PID: 86 at fs/iomap.c:755 iomap_dio_complete+0xe5/0x100
> [ 164.347193] Modules linked in: xfs dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver sr_mod cdrom sg ata_generic pata_acpi ppdev snd_pc
> m snd_timer snd soundcore parport_pc pcspkr serio_raw floppy parport ata_piix i2c_piix4 libata ip_tables
> [ 164.351698] CPU: 1 PID: 86 Comm: kworker/1:2 Not tainted 4.14.0-rc6 #1
> [ 164.353111] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [ 164.361199] Workqueue: dio/vda iomap_dio_complete_work
> [ 164.362418] task: ffff88007f8b8000 task.stack: ffffc90000a4c000
> [ 164.363666] RIP: 0010:iomap_dio_complete+0xe5/0x100
> [ 164.364923] RSP: 0018:ffffc90000a4fe08 EFLAGS: 00010286
> [ 164.366065] RAX: 00000000fffffff0 RBX: ffff88007a5fba80 RCX: ffffc90000a4fc58
> [ 164.367281] RDX: 0000000080000000 RSI: 00000000000002b4 RDI: ffffffff81cb9ccb
> [ 164.368330] RBP: ffffc90000a4fe30 R08: 0000000000000000 R09: ffff8800750eeb58
> [ 164.369503] R10: 0000000000001ffe R11: 0000000000000000 R12: 0000000002000000
> [ 164.370573] R13: ffff8800b8d27780 R14: ffff8800b8869058 R15: 0000000000000000
> [ 164.371674] FS: 0000000000000000(0000) GS:ffff88013fc80000(0000) knlGS:0000000000000000
> [ 164.373136] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 164.387036] CR2: 00007f5ee028e000 CR3: 000000007a1ff000 CR4: 00000000000006e0
> [ 164.388119] Call Trace:
> [ 164.388581] iomap_dio_complete_work+0x25/0x80
> [ 164.389385] process_one_work+0x198/0x3e0
> [ 164.390054] worker_thread+0x4e/0x3e0
> [ 164.390699] kthread+0x114/0x150
> [ 164.391341] ? process_one_work+0x3e0/0x3e0
> [ 164.392038] ? kthread_create_on_node+0x40/0x40
> [ 164.392922] ret_from_fork+0x25/0x30
> [ 164.393506] Code: e0 eb 9f 48 8b 77 18 e9 78 ff ff ff 4c 89 fa 48 03 53 18 4c 89 fe 48 c1 fe 0c 48 83 ea 01 48 c1 fa 0c e8 0f 93 f0 ff 85 c
> 0 74 93 <0f> ff eb 8f 4c 63 67 28 e9 54 ff ff ff 0f 1f 40 00 66 2e 0f 1f
> [ 164.396264] ---[ end trace 77984c98cc2335de ]---
> [ 364.297438] generic/208 200s
> [ 364.297442]
> [ 365.772872] XFS (vda): Unmounting Filesystem
> [ 366.507546] XFS (vda): Mounting V5 Filesystem
> [ 366.519283] XFS (vda): Ending clean mount
>
>
> [ 370.264428] run fstests generic/095 at 2017-10-25 05:38:44
> [ 370.392041] XFS (vdd): Unmounting Filesystem
> [ 372.641826] XFS (vdd): Mounting V5 Filesystem
> [ 372.648446] XFS (vdd): Ending clean mount
> [ 373.277786] ------------[ cut here ]------------
> [ 373.279820] ------------[ cut here ]------------
> [ 373.279826] WARNING: CPU: 2 PID: 29166 at fs/iomap.c:1016 iomap_dio_rw+0x346/0x360
> [ 373.279827] Modules linked in: xfs dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver sr_mod cdrom sg ata_generic pata_acpi ppdev snd_pcm snd_timer snd soundcore serio_raw pcspkr i2c_piix4 ata_piix parport_pc parport floppy libata ip_tables
> [ 373.279842] CPU: 2 PID: 29166 Comm: fio Not tainted 4.14.0-rc6 #1
> [ 373.279843] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [ 373.279844] task: ffff88008bd34a80 task.stack: ffffc900065f8000
> [ 373.279846] RIP: 0010:iomap_dio_rw+0x346/0x360
> [ 373.279847] RSP: 0000:ffffc900065fbd40 EFLAGS: 00010286
> [ 373.279848] RAX: 00000000fffffff0 RBX: ffffc900065fbe98 RCX: ffffc900065fbb90
> [ 373.279848] RDX: 0000000080000000 RSI: 00000000000002b4 RDI: ffffffff81cb9ccb
> [ 373.279849] RBP: ffffc900065fbdd0 R08: 0000000000000000 R09: 00007f6fe0948000
> [ 373.279850] R10: ffff88013ffd2000 R11: 0000000000000400 R12: 000000000005f400
> [ 373.279850] R13: ffffc900065fbe70 R14: ffff88007a6fb7e0 R15: 0000000000000011
> [ 373.279852] FS: 00007f6fb3ce1700(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> [ 373.279852] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 373.279853] CR2: 00007f6fe088ac00 CR3: 0000000078eb7000 CR4: 00000000000006e0
> [ 373.279856] Call Trace:
> [ 373.279913] ? xfs_file_aio_write_checks+0x137/0x190 [xfs]
> [ 373.279933] xfs_file_dio_aio_write+0xec/0x260 [xfs]
> [ 373.279952] ? xfs_file_dio_aio_write+0xec/0x260 [xfs]
> [ 373.279970] xfs_file_write_iter+0x98/0x150 [xfs]
> [ 373.279972] __vfs_write+0xd3/0x150
> [ 373.279974] vfs_write+0xc7/0x1b0
> [ 373.279975] SyS_pwrite64+0x76/0x90
> [ 373.279977] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 373.279979] RIP: 0033:0x7f6fcc3019d3
> [ 373.279979] RSP: 002b:00007f6fb3cdeb20 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
> [ 373.279980] RAX: ffffffffffffffda RBX: 00007f6fb3ce1700 RCX: 00007f6fcc3019d3
> [ 373.279981] RDX: 0000000000000400 RSI: 000055e925caac00 RDI: 0000000000000004
> [ 373.279982] RBP: 00007f6fe095fee0 R08: 0000000000000000 R09: 00007f6fb3ce1700
> [ 373.279982] R10: 000000000005f400 R11: 0000000000000293 R12: 0000000000000000
> [ 373.279983] R13: 00007f6fe095fe2f R14: 0000000000000003 R15: 00007f6fe095fee0
> [ 373.279984] Code: 21 00 00 f6 58 2b 81 49 c7 44 24 08 00 00 00 00 4c 89
> f7 e8 ed ec ff ff e9 13 fe ff ff 48 8d 7d a0 e8 2f a9 13 00 e9 33 ff ff ff
> <0f> ff e9 6a fe ff ff 48 c7 c0 f4 ff ff ff e9 f2 fd ff ff 90 90 [
> 373.279998] ---[ end trace 01e002a8aa01ae25 ]---
> [ 373.334858] WARNING: CPU: 3 PID: 29158 at fs/iomap.c:755 iomap_dio_complete+0xe5/0x100
> [ 373.336817] Modules linked in: xfs dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver sr_mod cdrom sg ata_generic pata_acpi ppdev snd_pcm snd_timer snd soundcore serio_raw pcspkr i2c_piix4 ata_piix parport_pc parport floppy libata ip_tables
> [ 373.341916] CPU: 3 PID: 29158 Comm: fio Tainted: G W 4.14.0-rc6 #1
> [ 373.344151] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [ 373.346702] task: ffff88008bd32540 task.stack: ffffc900065c8000
> [ 373.348512] RIP: 0010:iomap_dio_complete+0xe5/0x100
> [ 373.349999] RSP: 0018:ffffc900065cbd08 EFLAGS: 00010286
> [ 373.351604] RAX: 00000000fffffff0 RBX: ffff88007a5611e0 RCX: ffffc900065cbb58
> [ 373.353690] RDX: 0000000080000000 RSI: 00000000000002b4 RDI: ffffffff81cb9ccb
> [ 373.355693] RBP: ffffc900065cbd30 R08: 0000000000000000 R09: 00007f6fe08cb000
> [ 373.357775] R10: ffffc900065cb8c8 R11: 0000000000000207 R12: 0000000000000400
> [ 373.359837] R13: ffffc900065cbe70 R14: ffff8800787657d8 R15: 0000000000002400
> [ 373.361878] FS: 00007f6fe095aa80(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
> [ 373.364333] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 373.366071] CR2: 00007f6fcc38c1ee CR3: 000000007949a000 CR4: 00000000000006e0
> [ 373.368170] Call Trace:
> [ 373.369165] ? iomap_dio_rw+0x2a6/0x360
> [ 373.370467] iomap_dio_rw+0x333/0x360
> [ 373.371768] xfs_file_dio_aio_write+0xec/0x260 [xfs]
> [ 373.373069] ? xfs_file_dio_aio_write+0xec/0x260 [xfs]
> [ 373.374410] xfs_file_write_iter+0x98/0x150 [xfs]
> [ 373.375820] __vfs_write+0xd3/0x150
> [ 373.377096] vfs_write+0xc7/0x1b0
> [ 373.378436] SyS_pwrite64+0x76/0x90
> [ 373.379897] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 373.381488] RIP: 0033:0x7f6fcc3019d3
> [ 373.382781] RSP: 002b:00007f6fe0957ea0 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
> [ 373.385064] RAX: ffffffffffffffda RBX: 000055e925cb07f0 RCX: 00007f6fcc3019d3
> [ 373.386913] RDX: 0000000000000400 RSI: 000055e925ca9800 RDI: 0000000000000005
> [ 373.388911] RBP: 0000000000000005 R08: 0000000000000000 R09: 00007f6fe095aa80
> [ 373.390588] R10: 0000000000002400 R11: 0000000000000293 R12: 00007f6fe095aa80
> [ 373.392296] R13: 00007f6fe0957ec0 R14: 0000000000000003 R15: 000055e925caf360
> [ 373.394229] Code: e0 eb 9f 48 8b 77 18 e9 78 ff ff ff 4c 89 fa 48 03 53
> 18 4c 89 fe 48 c1 fe 0c 48 83 ea 01 48 c1 fa 0c e8 0f 93 f0 ff 85 c0 74 93
> <0f> ff eb 8f 4c 63 67 28 e9 54 ff ff ff 0f 1f 40 00 66 2e 0f 1f [
> 373.398812] ---[ end trace 01e002a8aa01ae26 ]---
> [ 392.887639] XFS (vdd): Unmounting Filesystem
> [ 392.985715] generic/095 [failed, exit status 1] - output mismatch (see /lkp/benchmarks/xfstests/results//generic/095.out.bad)
>
> Thanks,
> Fengguang