Re: WARNING: at fs/fs-writeback.c when plug out SD card after system suspend/resume

From: Jan Kara
Date: Thu Dec 04 2014 - 07:41:48 EST


On Thu 04-12-14 11:43:17, Dong Aisheng wrote:
> Hi ALL,
>
> We met an filesystem issue when do stable kernel upgrade from 3.10.31 to
> 3.10.53. And we found it's caused by the following commit bf0972039 which
> introduced in 3.10.53.
> After applying this patch, after system suspend/resume, plug out a SD card
> will cause the following WARNING if SD card has a filesystem mounted.
> If revert it, no such WARNING shows.
>
> I also tried the latest linux-next tree, it also has such issue.
>
> Looks the patch is used to fixing a potential system crashing.
> We're not sure whether this WARNING is as expected and reasonable
> or a BUG because there's no such WARNING before this patch.
>
> Can someone explain about it?
The warning happens because bdi disappeared from under filesystem (likely
it was even freed) but filesystem still has references to it. Previously,
we were just silenly using freed memory, now we warn about it because we
now clear the BDI_registered bit before freeing the bdi.

So for now the best advice I can give you is: Don't remove device from
under mounted filesystem (even when the system is suspended). I may easily
crash your machine.

We should fix bdi lifetime issues by making bdi live as long as the
filesystem on top of it but someone has to find time to do that...

Honza

> Reproduce step is as follows:
> root@imx6qdlsolo:~# mmc2: mmc_rescan_try_freq: trying to init card at 400000 Hz
> mmc2: Problem setting current limit!
> mmc2: new ultra high speed DDR50 SDHC card at address aaaa
> mmcblk2: mmc2:aaaa SL32G 29.7 GiB
> mmcblk2: p1 p2
> wm8962 3-001a: Failed to get supply 'DCVDD': -517
> wm8962 3-001a: Failed to request supplies: -517
> i2c 3-001a: Driver wm8962 requests probe deferral
> kjournald starting. Commit interval 5 seconds
> EXT3-fs (mmcblk2p2): using internal journal
> EXT3-fs (mmcblk2p2): recovery complete
> EXT3-fs (mmcblk2p2): mounted filesystem with ordered data mode
> FAT-fs (mmcblk2p1): Volume was not properly unmounted. Some data may
> be corrupt. Please run fsck.
>
> root@imx6qdlsolo:~#
> root@imx6qdlsolo:~# echo mem > /sys/power/state
> PM: Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.01 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
> Suspending console(s) (use no_console_suspend to debug)
> PM: suspend of devices complete after 45.436 msecs
> PM: suspend devices took 0.050 seconds
> PM: late suspend of devices complete after 0.599 msecs
> PM: noirq suspend of devices complete after 0.704 msecs
> Disabling non-boot CPUs ...
> Turn off M/F mix!
> PM: noirq resume of devices complete after 0.380 msecs
> PM: early resume of devices complete after 0.498 msecs
> imx-sdma 20ec000.sdma: loaded firmware 1.1
> mmc2: Problem setting current limit!
> PM: resume of devices complete after 409.704 msecs
> PM: resume devices took 0.410 seconds
> Restarting tasks ... done.
> root@imx6qdlsolo:~#
> root@imx6qdlsolo:~# libphy: 2188000.ethernet:01 - Link is Up - 100/Full
> mmc2: card aaaa removed
> ------------[ cut here ]------------
> WARNING: at fs/fs-writeback.c:1196 __mark_inode_dirty+0x1d0/0x1d4()
> bdi-block not registered
> Modules linked in:
> CPU: 0 PID: 927 Comm: umount Not tainted 3.10.53-02602-g89aa41e #751
> [<80013b00>] (unwind_backtrace+0x0/0xf4) from [<80011524>]
> (show_stack+0x10/0x14)
> [<80011524>] (show_stack+0x10/0x14) from [<8002c290>]
> (warn_slowpath_common+0x54/0x6c)
> [<8002c290>] (warn_slowpath_common+0x54/0x6c) from [<8002c2d8>]
> (warn_slowpath_fmt+0x30/0x40)
> [<8002c2d8>] (warn_slowpath_fmt+0x30/0x40) from [<800e8bbc>]
> (__mark_inode_dirty+0x1d0/0x1d4)
> [<800e8bbc>] (__mark_inode_dirty+0x1d0/0x1d4) from [<80131ba8>]
> (ext3_put_super+0x20c/0x23c)
> [<80131ba8>] (ext3_put_super+0x20c/0x23c) from [<800c88e0>]
> (generic_shutdown_super+0x58/0xc4)
> [<800c88e0>] (generic_shutdown_super+0x58/0xc4) from [<800c8b14>]
> (kill_block_super+0x18/0x68)
> [<800c8b14>] (kill_block_super+0x18/0x68) from [<800c8e60>]
> (deactivate_locked_super+0x48/0x64)
> [<800c8e60>] (deactivate_locked_super+0x48/0x64) from [<800e271c>]
> (SyS_umount+0x94/0x38c)
> [<800e271c>] (SyS_umount+0x94/0x38c) from [<8000e080>]
> (ret_fast_syscall+0x0/0x30)
> ---[ end trace a52c980ef229d9da ]---
> EXT3-fs (mmcblk2p2): I/O error while writing superblock
>
> Caused by following commit:
> commit bf0972039ddc483a9cb79edae73076c635876568
> Author: Jan Kara <jack@xxxxxxx>
> Date: Thu Apr 3 14:46:23 2014 -0700
>
> bdi: avoid oops on device removal
>
> commit 5acda9d12dcf1ad0d9a5a2a7c646de3472fa7555 upstream.
>
> After commit 839a8e8660b6 ("writeback: replace custom worker pool
> implementation with unbound workqueue") when device is removed while we
> are writing to it we crash in bdi_writeback_workfn() ->
> set_worker_desc() because bdi->dev is NULL.
>
> This can happen because even though bdi_unregister() cancels all pending
> flushing work, nothing really prevents new ones from being queued from
> balance_dirty_pages() or other places.
>
> Fix the problem by clearing BDI_registered bit in bdi_unregister() and
> checking it before scheduling of any flushing work.
>
> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
>
> Reviewed-by: Tejun Heo <tj@xxxxxxxxxx>
> Signed-off-by: Jan Kara <jack@xxxxxxx>
> Cc: Derek Basehore <dbasehore@xxxxxxxxxxxx>
> Cc: Jens Axboe <axboe@xxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>
> Regards
> Dong Aisheng
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/