Re: [PATCH] MMC: fix hang if card was removed during suspend andunsafe resume was enabled
From: Maxim Levitsky
Date: Fri Feb 05 2010 - 09:19:32 EST
On Fri, 2010-02-05 at 06:13 -0800, Andrew Morton wrote:
> On Fri, 05 Feb 2010 10:31:42 +0200 Maxim Levitsky <maximlevitsky@xxxxxxxxx> wrote:
>
> > On Thu, 2010-02-04 at 16:09 -0800, Andrew Morton wrote:
> > > On Fri, 5 Feb 2010 01:18:15 +0200 Maxim Levitsky <maximlevitsky@xxxxxxxxx> wrote:
> > >
> > > > Currently removal of the card leads to del_disk called indirectly by mmc core.
> > > > This function expects userspace to be running, which isn't when .resume is called
> > > >
> > > > Fix that by removing the code that did that in mmc_resume_host. It is possible
> > > > because card detection logic will kick it later and remove the card.
> > >
> > > I don't really understand. The above implies that to trigger this bug,
> > > one needs to physically remove the card during a resume operation. ie:
> > > a human-vs-computer race. Sounds unlikely?
> > >
> > > So... exactly what steps does the user need to take to trigger this
> >
> > Sorry for describing this poorly.
> > The steps are:
> >
> > -> Have a kernel with CONFIG_MMC_UNSAFE_RESUME
> > -> Insert MMC/SD card
> > -> Suspend/hibernate the system
> > -> While system is hibernated/suspended pull the card off
> > -> Resume the system
> > -> Hang
> >
> >
> > if CONFIG_MMC_UNSAFE_RESUME is set, mmc core allows the user to
> > suspend/resume the card normally assuming he won't change the card or
> > modify it in another system. The former case is actually handled quite
> > well.
> >
> > if CONFIG_MMC_UNSAFE_RESUME isn't set, it removes the card during
> > suspend, and I now think (and will test) that this will still hang the
> > system this time on suspend.
> >
> > Maybe we can make del_disk behave well if called with userspace frozen?
> > After all if user calls it, very likely that hardware is absent thus
> > there is no point in syncing (which I think triggers the hang)....
> >
>
> There is no del_disk in the kernel. Let's be more specific (and
> accurate!) about the hang. I assume it's
> mmc_remove_card->device_del->kobject_uevent?
Sorry!
I was referring to del_gendisk.
<4>[15241.042047] [<ffffffff8106620a>] ? prepare_to_wait+0x2a/0x90
<4>[15241.042159] [<ffffffff810790bd>] ? trace_hardirqs_on+0xd/0x10
<4>[15241.042271] [<ffffffff8140db12>] ? _raw_spin_unlock_irqrestore+0x42/0x80
<4>[15241.042386] [<ffffffff8112a390>] ? bdi_sched_wait+0x0/0x20
<4>[15241.042496] [<ffffffff8112a39e>] bdi_sched_wait+0xe/0x20
<4>[15241.042606] [<ffffffff8140af6f>] __wait_on_bit+0x5f/0x90
<4>[15241.042714] [<ffffffff8112a390>] ? bdi_sched_wait+0x0/0x20
<4>[15241.042824] [<ffffffff8140b018>] out_of_line_wait_on_bit+0x78/0x90
<4>[15241.042935] [<ffffffff81065fd0>] ? wake_bit_function+0x0/0x40
<4>[15241.043045] [<ffffffff8112a2d3>] ? bdi_queue_work+0xa3/0xe0
<4>[15241.043155] [<ffffffff8112a37f>] bdi_sync_writeback+0x6f/0x80
<4>[15241.043265] [<ffffffff8112a3d2>] sync_inodes_sb+0x22/0x120
<4>[15241.043375] [<ffffffff8112f1d2>] __sync_filesystem+0x82/0x90
<4>[15241.043485] [<ffffffff8112f3db>] sync_filesystem+0x4b/0x70
<4>[15241.043594] [<ffffffff811391de>] fsync_bdev+0x2e/0x60
<4>[15241.043704] [<ffffffff812226be>] invalidate_partition+0x2e/0x50
<4>[15241.043816] [<ffffffff8116b92f>] del_gendisk+0x3f/0x140
<4>[15241.043926] [<ffffffffa00c0233>] mmc_blk_remove+0x33/0x60 [mmc_block]
<4>[15241.044043] [<ffffffff81338977>] mmc_bus_remove+0x17/0x20
<4>[15241.044152] [<ffffffff812ce746>] __device_release_driver+0x66/0xc0
<4>[15241.044264] [<ffffffff812ce89d>] device_release_driver+0x2d/0x40
<4>[15241.044375] [<ffffffff812cd9b5>] bus_remove_device+0xb5/0x120
<4>[15241.044486] [<ffffffff812cb46f>] device_del+0x12f/0x1a0
<4>[15241.044593] [<ffffffff81338a5b>] mmc_remove_card+0x5b/0x90
<4>[15241.044702] [<ffffffff8133ac27>] mmc_sd_remove+0x27/0x50
<4>[15241.044811] [<ffffffff81337d8c>] mmc_resume_host+0x10c/0x140
<4>[15241.044929] [<ffffffffa00850e9>] sdhci_resume_host+0x69/0xa0 [sdhci]
<4>[15241.045044] [<ffffffffa0bdc39e>] sdhci_pci_resume+0x8e/0xb0 [sdhci_pci]
>
> Yes, I'd have thought that it would be a good idea for the
> kobject_uevent code (or lower, in call_usermodehelper) to take avoiding
> action if userspace is frozen. However such action would probably
> involve doing a WARN_ON() too, so we'd still need MMC changes to avoid
> that.
>
>
Best regards,
Maxim Levitsky
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/