Re: [<PATCH v1> 4/9] mmc: core: fix SD card request queue refcount underflow during shutdown

From: Greg KH
Date: Wed Dec 18 2019 - 03:33:19 EST


On Mon, Dec 16, 2019 at 06:50:37PM -0800, Bao D. Nguyen wrote:
> From: Can Guo <cang@xxxxxxxxxxxxxx>
>
> When system shutdown, kernel shall call shutdown function of mmc to stop
> its request queue and clean it up, during which the request queue's kobject
> shall be put once. In normal cases, if the SD card is removed, the
> mmc_blk_remove routine releases all the resources and kobjects related to
> the disk and request queue by decreasing their kref counts to 0. But if the
> SD card is removed after its shutdown function is called, below kref count
> underflow error shall be thrown out because the kref count was decreased
> once during request queue cleanup by the shutdown function in advance. This
> change fixes it by skipping request queue cleanup in the mmc blk routine if
> the queue has been marked as dead.
>
> [ 166.187211] refcount_t: underflow; use-after-free.
> [ 166.187277] ------------[ cut here ]------------
> [ 166.187321] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [ 166.187542] Workqueue: events_freezable mmc_rescan
> [ 166.187558] task: ffffffe673b96680 task.stack: ffffff8008418000
> [ 166.187579] pc : refcount_sub_and_test+0x64/0x78
> [ 166.187593] lr : refcount_sub_and_test+0x64/0x78
> [ 166.187605] sp : ffffff800841ba20 pstate : 60c00145
> [ 166.188319] Call trace:
> [ 166.188331] refcount_sub_and_test+0x64/0x78
> [ 166.188343] refcount_dec_and_test+0x18/0x24
> [ 166.188355] kobject_put+0x5c/0x74
> [ 166.188374] blk_put_queue+0x1c/0x28
> [ 166.188388] disk_release+0x70/0x90
> [ 166.188402] device_release+0x38/0x90
> [ 166.188429] kobject_cleanup+0xc4/0x1c0
> [ 166.188441] kobject_put+0x68/0x74
> [ 166.188455] put_disk+0x20/0x2c
> [ 166.188467] mmc_blk_put+0x9c/0xdc
> [ 166.188480] mmc_blk_remove_req+0x110/0x120
> [ 166.188493] mmc_blk_remove+0x14c/0x22c
> [ 166.188505] mmc_bus_remove+0x24/0x34
> [ 166.188517] device_release_driver_internal+0x13c/0x1e0
> [ 166.188528] device_release_driver+0x24/0x30
> [ 166.188540] bus_remove_device+0xdc/0x120
> [ 166.188552] device_del+0x1e0/0x284
> [ 166.188564] mmc_remove_card+0x68/0x7c
> [ 166.188577] mmc_sd_remove+0x24/0x48
> [ 166.188588] mmc_sd_detect+0x120/0x1a4
> [ 166.188600] mmc_rescan+0xf4/0x384
> [ 166.188613] process_one_work+0x1c0/0x3d4
> [ 166.188625] worker_thread+0x224/0x344
> [ 166.188637] kthread+0x120/0x130
> [ 166.188649] ret_from_fork+0x10/0x18.
>
> Signed-off-by: Can Guo <cang@xxxxxxxxxxxxxx>
> Signed-off-by: Sayali Lokhande <sayalil@xxxxxxxxxxxxxx>
> Signed-off-by: Bao D. Nguyen <nguyenb@xxxxxxxxxxxxxx>
> ---
> drivers/mmc/core/queue.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 9edc086..846557b 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -506,7 +506,8 @@ void mmc_cleanup_queue(struct mmc_queue *mq)
> if (blk_queue_quiesced(q))
> blk_mq_unquiesce_queue(q);
>
> - blk_cleanup_queue(q);
> + if (likely(!blk_queue_dead(q)))
> + blk_cleanup_queue(q);

Unless you can measure the performance impact, never use unlikely/likely
in kernel code. The compiler and cpu will always do much better over
time than you can.

That being said, what will cleanup the queue if it is not "dead" at this
point in time, later on? Isn't this a leak?

thanks,

greg k-h