WARNING: at fs/fs-writeback.c when plug out SD card after system suspend/resume

From: Dong Aisheng
Date: Wed Dec 03 2014 - 22:46:40 EST


Hi ALL,

We met an filesystem issue when do stable kernel upgrade from 3.10.31 to
3.10.53. And we found it's caused by the following commit bf0972039 which
introduced in 3.10.53.
After applying this patch, after system suspend/resume, plug out a SD card
will cause the following WARNING if SD card has a filesystem mounted.
If revert it, no such WARNING shows.

I also tried the latest linux-next tree, it also has such issue.

Looks the patch is used to fixing a potential system crashing.
We're not sure whether this WARNING is as expected and reasonable
or a BUG because there's no such WARNING before this patch.

Can someone explain about it?

Reproduce step is as follows:
root@imx6qdlsolo:~# mmc2: mmc_rescan_try_freq: trying to init card at 400000 Hz
mmc2: Problem setting current limit!
mmc2: new ultra high speed DDR50 SDHC card at address aaaa
mmcblk2: mmc2:aaaa SL32G 29.7 GiB
mmcblk2: p1 p2
wm8962 3-001a: Failed to get supply 'DCVDD': -517
wm8962 3-001a: Failed to request supplies: -517
i2c 3-001a: Driver wm8962 requests probe deferral
kjournald starting. Commit interval 5 seconds
EXT3-fs (mmcblk2p2): using internal journal
EXT3-fs (mmcblk2p2): recovery complete
EXT3-fs (mmcblk2p2): mounted filesystem with ordered data mode
FAT-fs (mmcblk2p1): Volume was not properly unmounted. Some data may
be corrupt. Please run fsck.

root@imx6qdlsolo:~#
root@imx6qdlsolo:~# echo mem > /sys/power/state
PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.01 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
Suspending console(s) (use no_console_suspend to debug)
PM: suspend of devices complete after 45.436 msecs
PM: suspend devices took 0.050 seconds
PM: late suspend of devices complete after 0.599 msecs
PM: noirq suspend of devices complete after 0.704 msecs
Disabling non-boot CPUs ...
Turn off M/F mix!
PM: noirq resume of devices complete after 0.380 msecs
PM: early resume of devices complete after 0.498 msecs
imx-sdma 20ec000.sdma: loaded firmware 1.1
mmc2: Problem setting current limit!
PM: resume of devices complete after 409.704 msecs
PM: resume devices took 0.410 seconds
Restarting tasks ... done.
root@imx6qdlsolo:~#
root@imx6qdlsolo:~# libphy: 2188000.ethernet:01 - Link is Up - 100/Full
mmc2: card aaaa removed
------------[ cut here ]------------
WARNING: at fs/fs-writeback.c:1196 __mark_inode_dirty+0x1d0/0x1d4()
bdi-block not registered
Modules linked in:
CPU: 0 PID: 927 Comm: umount Not tainted 3.10.53-02602-g89aa41e #751
[<80013b00>] (unwind_backtrace+0x0/0xf4) from [<80011524>]
(show_stack+0x10/0x14)
[<80011524>] (show_stack+0x10/0x14) from [<8002c290>]
(warn_slowpath_common+0x54/0x6c)
[<8002c290>] (warn_slowpath_common+0x54/0x6c) from [<8002c2d8>]
(warn_slowpath_fmt+0x30/0x40)
[<8002c2d8>] (warn_slowpath_fmt+0x30/0x40) from [<800e8bbc>]
(__mark_inode_dirty+0x1d0/0x1d4)
[<800e8bbc>] (__mark_inode_dirty+0x1d0/0x1d4) from [<80131ba8>]
(ext3_put_super+0x20c/0x23c)
[<80131ba8>] (ext3_put_super+0x20c/0x23c) from [<800c88e0>]
(generic_shutdown_super+0x58/0xc4)
[<800c88e0>] (generic_shutdown_super+0x58/0xc4) from [<800c8b14>]
(kill_block_super+0x18/0x68)
[<800c8b14>] (kill_block_super+0x18/0x68) from [<800c8e60>]
(deactivate_locked_super+0x48/0x64)
[<800c8e60>] (deactivate_locked_super+0x48/0x64) from [<800e271c>]
(SyS_umount+0x94/0x38c)
[<800e271c>] (SyS_umount+0x94/0x38c) from [<8000e080>]
(ret_fast_syscall+0x0/0x30)
---[ end trace a52c980ef229d9da ]---
EXT3-fs (mmcblk2p2): I/O error while writing superblock

Caused by following commit:
commit bf0972039ddc483a9cb79edae73076c635876568
Author: Jan Kara <jack@xxxxxxx>
Date: Thu Apr 3 14:46:23 2014 -0700

bdi: avoid oops on device removal

commit 5acda9d12dcf1ad0d9a5a2a7c646de3472fa7555 upstream.

After commit 839a8e8660b6 ("writeback: replace custom worker pool
implementation with unbound workqueue") when device is removed while we
are writing to it we crash in bdi_writeback_workfn() ->
set_worker_desc() because bdi->dev is NULL.

This can happen because even though bdi_unregister() cancels all pending
flushing work, nothing really prevents new ones from being queued from
balance_dirty_pages() or other places.

Fix the problem by clearing BDI_registered bit in bdi_unregister() and
checking it before scheduling of any flushing work.

Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977

Reviewed-by: Tejun Heo <tj@xxxxxxxxxx>
Signed-off-by: Jan Kara <jack@xxxxxxx>
Cc: Derek Basehore <dbasehore@xxxxxxxxxxxx>
Cc: Jens Axboe <axboe@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

Regards
Dong Aisheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/