Re: workqueue threads ->journal_info buggery

From: Jan Kara
Date: Tue Sep 05 2017 - 07:26:54 EST


Hello,

On Tue 05-09-17 11:51:44, Nikolay Borisov wrote:
> I've hit the following problems under memory-heavy workload conditions:
>
> First is a BUG_ON : J_ASSERT(journal_current_handle() == handle);
>
> [ 64.261793] kernel BUG at fs/jbd2/transaction.c:1644!
> [ 64.263894] invalid opcode: 0000 [#1] SMP
> [ 64.266187] Modules linked in:
> [ 64.267472] CPU: 1 PID: 542 Comm: kworker/u12:6 Not tainted 4.12.0-nbor #135
> [ 64.269941] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [ 64.272374] Workqueue: writeback wb_workfn (flush-254:0)
> [ 64.273862] task: ffff88001c37b880 task.stack: ffff880018ac8000
> [ 64.275580] RIP: 0010:jbd2_journal_stop+0x375/0x4d0
> [ 64.276704] RSP: 0000:ffff880018acb990 EFLAGS: 00010286
> [ 64.278708] RAX: ffff88001c37b880 RBX: ffff88001e83c000 RCX: ffff88001c4f8800
> [ 64.280499] RDX: ffff88001e83c000 RSI: 0000000000000b26 RDI: ffff88001e83c000
> [ 64.282262] RBP: ffff880018acba10 R08: ffff880019ec5888 R09: 0000000000000000
> [ 64.284111] R10: 0000000000000000 R11: ffffffff81283f8f R12: ffff880018a1a140
> [ 64.285553] R13: ffff88001c4f8800 R14: ffff88001c47d000 R15: ffff880018aa01f0
> [ 64.286337] FS: 0000000000000000(0000) GS:ffff88001fc40000(0000) knlGS:0000000000000000
> [ 64.287671] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 64.288568] CR2: 0000000000421ac0 CR3: 000000001ae83000 CR4: 00000000000006a0
> [ 64.289468] Call Trace:
> [ 64.289748] ? __ext4_journal_get_write_access+0x67/0xc0
> [ 64.290330] ? ext4_writepages+0xec6/0x1200
> [ 64.290786] __ext4_journal_stop+0x3c/0xa0
> [ 64.291233] ext4_writepages+0x8b2/0x1200
> [ 64.291682] ? writeback_sb_inodes+0x11f/0x5c0
> [ 64.292174] do_writepages+0x1c/0x80
> [ 64.292572] ? do_writepages+0x1c/0x80
> [ 64.292985] __writeback_single_inode+0x61/0x760
> [ 64.293575] writeback_sb_inodes+0x28d/0x5c0
> [ 64.294192] __writeback_inodes_wb+0x92/0xc0
> [ 64.294777] wb_writeback+0x3e9/0x560
> [ 64.295241] wb_workfn+0x9a/0x5d0
> [ 64.295977] ? wb_workfn+0x9a/0x5d0
> [ 64.296788] ? process_one_work+0x15c/0x620
> [ 64.297971] process_one_work+0x1d9/0x620
> [ 64.298969] worker_thread+0x4e/0x3b0
> [ 64.299684] kthread+0x113/0x150
> [ 64.300287] ? process_one_work+0x620/0x620
> [ 64.301145] ? kthread_create_on_node+0x40/0x40
> [ 64.301953] ret_from_fork+0x2a/0x40
> [ 64.302572] Code: dd ff 41 8b 45 60 85 c0 0f 84 29 fe ff ff 49 8d bd 00 01 00 00 31 c9 ba 01 00 00 00 be 03 00 00 00 e8 90 c1 dd ff e9 0c fe ff ff <0f> 0b 44 89 fe 4c 89 ef e8 ce 83 00 00 89 45 c4 e9 18 fe ff ff
> [ 64.305997] RIP: jbd2_journal_stop+0x375/0x4d0 RSP: ffff880018acb990
> [ 64.307037] ---[ end trace ec3f7cbd6e733faf ]---
>
> I consulted with Jan his opinion is that this is due to ->journal_info
> in workqueue threads gets modified while the work was running.

Sorry, this was a false alarm. Nikolai eventually hit also traces that were
not from workqueue code and eventually we've tracked down the problem to
his btrfs swapfile patches which were overwriting current->journal_info in
the swapout path...

Honza

--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR