Re: WARNING: CPU: 0 PID: 1271 at drivers/mmc/core/core.c:991 mmc_release_host+0xa0/0xa8

From: Shawn Lin
Date: Fri Aug 12 2016 - 04:30:34 EST

Next message: tip-bot for Denys Vlasenko: "[tip:perf/urgent] uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions"
Previous message: tip-bot for Kan Liang: "[tip:perf/urgent] perf/x86/intel/uncore: Fix uncore num_counters"
In reply to: Jaehoon Chung: "Re: WARNING: CPU: 0 PID: 1271 at drivers/mmc/core/core.c:991 mmc_release_host+0xa0/0xa8"
Next in thread: John Stultz: "Re: WARNING: CPU: 0 PID: 1271 at drivers/mmc/core/core.c:991 mmc_release_host+0xa0/0xa8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

å 2016/8/12 16:01, Jaehoon Chung åé:

On 08/12/2016 04:13 PM, John Stultz wrote:

On Thu, Aug 4, 2016 at 9:52 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote:

Hey Ulf,
Since moving my HiKey branch to pre-v4.8-rc1 (linus's HEAD), I've
been seeing the following warning occasionally. Usually after seeing
it, the system will refuse to reboot (system does the "Emergency
remount complete" but then just sits there, and if I ctrl-c I can use
the shell fine but many commands will get me stuck).

Anyway, if you have any ideas...

thanks
-john

[ 24.154245] ------------[ cut here ]------------
[ 24.158903] WARNING: CPU: 2 PID: 1273 at
drivers/mmc/core/core.c:991 mmc_release_host+0xa0/0xa8
[ 24.167605]
[ 24.169104] CPU: 2 PID: 1273 Comm: mmcqd/0 Not tainted
4.7.0-11945-gb30f1d6-dirty #706
[ 24.177024] Hardware name: HiKey Development Board (DT)
[ 24.182253] task: ffffffc0793d8c80 task.stack: ffffffc078c48000
[ 24.188178] PC is at mmc_release_host+0xa0/0xa8
[ 24.192725] LR is at mmc_put_card+0x18/0x3c
[ 24.196917] pc : [<ffffff80086c2550>] lr : [<ffffff80086c31f4>]
pstate: 80000145
[ 24.204317] sp : ffffffc078c4bd20
[ 24.207636] x29: ffffffc078c4bd20 x28: 0000000000000000
[ 24.212975] x27: 0000000000000000 x26: ffffffc077903420
[ 24.216220] x25: ffffffc078788028 x24: ffffffc0787e8800
[ 24.216232] x23: ffffffc078788000 x22: 0000000000000000
[ 24.216243] x21: 0000000000000000 x20: ffffffc078788018
[ 24.216254] x19: ffffffc0787e8800 x18: 0000000000000000
[ 24.216265] x17: 0000000000000000 x16: 0000000000000000
[ 24.216276] x15: 0000000000000000 x14: ffffffc078789430
[ 24.216288] x13: 000000000000002f x12: 000000000000b853
[ 24.216299] x11: ffffffc077903420 x10: 0000000000000860
[ 24.216310] x9 : ffffffc078c48000 x8 : ffffffc0793d9540
[ 24.216322] x7 : 0000000000d3f8c7 x6 : 0000000000002bd0
[ 24.216333] x5 : 00000000021458fa x4 : 00ffffffffffffff
[ 24.216344] x3 : 00000000d0555555 x2 : ffffffc078c4bd5c
[ 24.216355] x1 : 0000000000000000 x0 : 0000000000000000
[ 24.216366]
[ 24.216372] ---[ end trace 74dade4766b71d8d ]---
[ 24.216377] Call trace:
[ 24.216386] Exception stack(0xffffffc078c4bb50 to 0xffffffc078c4bc80)
[ 24.216394] bb40:
ffffffc0787e8800 0000008000000000
[ 24.216403] bb60: ffffffc078c4bd20 ffffff80086c2550
ffffff8008ca6000 ffffffc0784fb200
[ 24.216411] bb80: ffffffc07bf4b7e8 0000000000000008
ffffffc0793d8d00 ffffff8008c82780
[ 24.216420] bba0: ffffffc078c4bbe0 ffffff800843576c
ffffffc078c4bbf0 ffffff800843576c
[ 24.216429] bbc0: ffffffc078c4bcc0 ffffffc078c4bc78
ffffffc078c4bc10 ffffff800843576c
[ 24.216437] bbe0: ffffffc078c4bce0 ffffffc078c4bc98
0000000000000000 0000000000000000
[ 24.216445] bc00: ffffffc078c4bd5c 00000000d0555555
00ffffffffffffff 00000000021458fa
[ 24.216452] bc20: 0000000000002bd0 0000000000d3f8c7
ffffffc0793d9540 ffffffc078c48000
[ 24.216460] bc40: 0000000000000860 ffffffc077903420
000000000000b853 000000000000002f
[ 24.216467] bc60: ffffffc078789430 0000000000000000
0000000000000000 0000000000000000
[ 24.216479] [<ffffff80086c2550>] mmc_release_host+0xa0/0xa8
[ 24.216486] [<ffffff80086c31f4>] mmc_put_card+0x18/0x3c
[ 24.216497] [<ffffff80086d30e4>] mmc_blk_issue_rq+0x11c/0x4a4
[ 24.216506] [<ffffff80086d3e44>] mmc_queue_thread+0x98/0x158
[ 24.216517] [<ffffff80080cfd7c>] kthread+0xd0/0xe4
[ 24.216527] [<ffffff8008082e90>] ret_from_fork+0x10/0x40

Hey Ulf,
I *think* I've narrowed this down to
6024e16654c1e1a2475e848d735963d05a12dba9 ("mmc: dw_mmc: set to
MMC_CAP_ERASE by default"). Its fairly sporadic so I may be seeing
this as a false positive, but after reverting that patch I've
seemingly stopped seeing the issue.

Hmm, i don't think so. I *guess* it's not related with commit 6024e16654.

Before calling mmc_put_card(), is it issued the discard request?

if ((!req && !(mq->flags & MMC_QUEUE_NEW_REQUEST)) ||
(cmd_flags & MMC_REQ_SPECIAL_MASK))

Which condition hit?

If special req meets, mrq_pre and mrq_cur are both null after schedule
queue. And for this special req, host->claimed is released. For the
next req peeking from blk, we run into mmc_get_card again which means
we should never meet this WARN when releasing host. So it's
interesting to dig out actually what is happening there...

But at least for dw_mmc-rockchip, we have been using this feature,
ERASE/Trim/discard, for years. I didn't see it ever. Anyway from the
code I was reading, I don't think it should be issue of this commit.

Please look at this regression report I saw.

https://lkml.org/lkml/2016/8/11/130

Anyway, I'll do some further testing tomorrow w/ that removed. Usually
I see the issue 1-2 times an hour, so if I go the day w/o a problem
I'll let you know.

Zhangfei/Guodong: Any ideas as to why ERASE might cause trouble on HiKey?

Did you try to send the Erase command directly? e,g fstrim or other things?
Is it occurred on every booting time?

Best Regards,
Jaehoon Chung

thanks
-john

--
Best Regards
Shawn Lin

Next message: tip-bot for Denys Vlasenko: "[tip:perf/urgent] uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions"
Previous message: tip-bot for Kan Liang: "[tip:perf/urgent] perf/x86/intel/uncore: Fix uncore num_counters"
In reply to: Jaehoon Chung: "Re: WARNING: CPU: 0 PID: 1271 at drivers/mmc/core/core.c:991 mmc_release_host+0xa0/0xa8"
Next in thread: John Stultz: "Re: WARNING: CPU: 0 PID: 1271 at drivers/mmc/core/core.c:991 mmc_release_host+0xa0/0xa8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]