Re: [PATCH] mmc: core: Don't allocate IDA for OF aliases

From: Stephen Boyd
Date: Fri Apr 30 2021 - 18:19:46 EST


Quoting Ulf Hansson (2021-04-16 16:07:11)
> On Fri, 16 Apr 2021 at 19:50, Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote:
> >
> > Ok. Can you share the KASAN reports? The memory allocated for this class
> > object and associated index from the IDA will be unique for each call
> > the mmc_alloc_host() so I don't see how KASAN could be noticing
> > something unless a reference to the host is leaking out without the
> > proper get_device() call being made to keep the underlying memory from
> > being freed.
>
> Removing the host, also means removing the card. Although, as I said,
> I need some more time to think of the best solution.
>
> Here's a report, triggered with some manual hacks and by using the
> mmc-utils usesland tool.

Thanks! I'm just getting back to this mail on a Friday afternoon!

>
> /mmc status get /dev/mmcblk1
> [ 95.905913] DEBUG: mmc_blk_open: Let's sleep for 10s..
> [ 98.806639] mmc1: card 0007 removed
> [ 105.972139] BUG: KASAN: use-after-free in mmc_blk_get+0x58/0xb8
> [ 105.979144] Read of size 4 at addr ffff00000a394a28 by task mmc/180
> [ 105.984945]
> [ 105.991209] CPU: 2 PID: 180 Comm: mmc Not tainted
> 5.10.0-rc4-00069-gcc758c8c7127-dirty #5
> [ 105.992943] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> [ 106.001010] Call trace:
> [ 106.007799] dump_backtrace+0x0/0x2b4
> [ 106.009965] show_stack+0x18/0x6c
> [ 106.013779] dump_stack+0xfc/0x168
> [ 106.017083] print_address_description.constprop.0+0x6c/0x488
> [ 106.020384] kasan_report+0x118/0x210
> [ 106.026193] __asan_load4+0x94/0xd0
> [ 106.029844] mmc_blk_get+0x58/0xb8
> [ 106.033141] mmc_blk_open+0x7c/0xdc
> [ 106.036614] __blkdev_get+0x3b4/0x964
> [ 106.039996] blkdev_get+0x64/0x100
> [ 106.043815] blkdev_open+0xe8/0x104
> [ 106.047114] do_dentry_open+0x234/0x61c
> [ 106.050498] vfs_open+0x54/0x64
> [ 106.054324] path_openat+0xe04/0x1584
> [ 106.057450] do_filp_open+0xe8/0x1e4
> [ 106.061263] do_sys_openat2+0x120/0x230
> [ 106.064911] __arm64_sys_openat+0xf0/0x15c
> [ 106.068477] el0_svc_common.constprop.0+0xac/0x234
> [ 106.072639] do_el0_svc+0x84/0xa0
> [ 106.077410] el0_sync_handler+0x264/0x270
> [ 106.080790] el0_sync+0x174/0x180
> [ 106.084768]
> [ 106.088070] Allocated by task 33:
> [ 106.089647] stack_trace_save+0x9c/0xdc
> [ 106.092858] kasan_save_stack+0x28/0x60
> [ 106.096506] __kasan_kmalloc.constprop.0+0xc8/0xf0
> [ 106.100325] kasan_kmalloc+0x10/0x20
> [ 106.105183] mmc_blk_alloc_req+0x94/0x4b0

I see that this is a different IDA managed object, for mmc_blk_ida.
Presumably the object allocated is md?

md = kzalloc(sizeof(struct mmc_blk_data), GFP_KERNEL);

Is that the line?

> [ 106.108913] mmc_blk_probe+0x2d4/0xaa4
> [ 106.112829] mmc_bus_probe+0x34/0x4c
> [ 106.116471] really_probe+0x148/0x6e0
> [ 106.120202] driver_probe_device+0x78/0xec
> [ 106.123764] __device_attach_driver+0x108/0x16c
> [ 106.127754] bus_for_each_drv+0xf4/0x15c
> [ 106.132180] __device_attach+0x168/0x240
> [ 106.136349] device_initial_probe+0x14/0x20
> [ 106.140253] bus_probe_device+0xec/0x100
> [ 106.144167] device_add+0x55c/0xaf0
> [ 106.148332] mmc_add_card+0x288/0x380
> [ 106.151540] mmc_attach_sd+0x18c/0x22c
> [ 106.155363] mmc_rescan+0x444/0x4f0
> [ 106.159014] process_one_work+0x3b8/0x650
> [ 106.162396] worker_thread+0xa0/0x724
> [ 106.166556] kthread+0x218/0x220
> [ 106.170201] ret_from_fork+0x10/0x38
> [ 106.173482]
> [ 106.177045] Freed by task 33:
> [ 106.178533] stack_trace_save+0x9c/0xdc
> [ 106.181399] kasan_save_stack+0x28/0x60
> [ 106.185045] kasan_set_track+0x28/0x40
> [ 106.188868] kasan_set_free_info+0x24/0x4c
> [ 106.192684] __kasan_slab_free+0x100/0x180
> [ 106.196764] kasan_slab_free+0x14/0x20
> [ 106.200838] kfree+0xb8/0x46c
> [ 106.204583] mmc_blk_put+0xe4/0x11c

Looking at mmc_blk_put() and mmc_blk_get() I see


static struct mmc_blk_data *mmc_blk_get(struct gendisk *disk)
{
...
md = disk->private_data;
if (md && md->usage == 0)
md = NULL;
...
}

static void mmc_blk_put(struct mmc_blk_data *md)
{
...
if (md->usage == 0) {
int devidx = mmc_get_devidx(md->disk);
blk_put_queue(md->queue.queue);
ida_simple_remove(&mmc_blk_ida, devidx);
put_disk(md->disk);
kfree(md);
}
...
}

and notice that mmc_blk_get() takes a gendisk but mmc_blk_put() takes a
mmc_blk_data. That's already weird, but then I notice that 'md' comes
from disk->private_data in mmc_blk_get() and in mmc_blk_put() we kfree
'md' if usage drops to 0. The storage for md is inside
disk->private_data, according to mmc_blk_alloc_req()

md->disk->private_data = md

so 'md' points to itself through gendisk private_data. Alright.

Either way, KASAN is telling us that 'md' got freed in mmc_blk_put() but
the gendisk is still around and valid, because it's held alive via some
kobject_get(). When we go to blkdev_open() we have that gendisk that has
private_data pointing to the freed 'md'.

This is a classic dangling pointer bug.

Given that mmc_blk_get() is checking for disk->private_data being
non-NULL, I looked to see where that is assigned to NULL, but I don't
see it. Is it ever set to NULL? We could set private_data to NULL after
freeing it, but that feels hacky. The md->usage code looks like a kref
design open-coded; probably replace that with a kref that does the code
in the if (md->usage == 0) path on the kref release function and then we
wouldn't need a mutex around these get/put APIs.

Does this patch fix it?

---8<----
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index d666e24fbe0e..c7939e8fe76f 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -203,6 +203,7 @@ static void mmc_blk_put(struct mmc_blk_data *md)
int devidx = mmc_get_devidx(md->disk);
blk_put_queue(md->queue.queue);
ida_simple_remove(&mmc_blk_ida, devidx);
+ md->disk->private_data = NULL;
put_disk(md->disk);
kfree(md);
}

> [ 106.207624] mmc_blk_remove_req.part.0+0x6c/0xe4
> [ 106.210914] mmc_blk_remove+0x368/0x370
> [ 106.215778] mmc_bus_remove+0x34/0x50
> [ 106.219336] __device_release_driver+0x228/0x31c
> [ 106.223155] device_release_driver+0x2c/0x44
> [ 106.227840] bus_remove_device+0x1e4/0x200
> [ 106.232100] device_del+0x2b0/0x770
> [ 106.236005] mmc_remove_card+0xf0/0x150
> [ 106.239383] mmc_sd_detect+0x9c/0x150
> [ 106.243207] mmc_rescan+0x110/0x4f0
> [ 106.247032] process_one_work+0x3b8/0x650
> [ 106.250329] worker_thread+0xa0/0x724
> [ 106.254488] kthread+0x218/0x220
> [ 106.258134] ret_from_fork+0x10/0x38
> [ 106.261416]
> [ 106.264986] The buggy address belongs to the object at ffff00000a394800
> [ 106.264986] which belongs to the cache kmalloc-1k of size 1024
> [ 106.266485] The buggy address is located 552 bytes inside of
> [ 106.266485] 1024-byte region [ffff00000a394800, ffff00000a394c00)
> [ 106.278710] The buggy address belongs to the page:
> [ 106.290520] page:00000000ff84ed53 refcount:1 mapcount:0
> mapping:0000000000000000 index:0x0 pfn:0x8a390
> [ 106.295381] head:00000000ff84ed53 order:3 compound_mapcount:0
> compound_pincount:0
> [ 106.304669] flags: 0x3fffc0000010200(slab|head)
> [ 106.312234] raw: 03fffc0000010200 dead000000000100 dead000000000122
> ffff000009f03800
> [ 106.316571] raw: 0000000000000000 0000000000100010 00000001ffffffff
> 0000000000000000
> [ 106.324537] page dumped because: kasan: bad access detected
> [ 106.332254]
> [ 106.337543] Memory state around the buggy address:
> [ 106.339302] ffff00000a394900: fb fb fb fb fb fb fb fb fb fb fb fb
> fb fb fb fb
> [ 106.343907] ffff00000a394980: fb fb fb fb fb fb fb fb fb fb fb fb
> fb fb fb fb
> [ 106.351111] >ffff00000a394a00: fb fb fb fb fb fb fb fb fb fb fb fb
> fb fb fb fb
> [ 106.358303] ^
> [ 106.365515] ffff00000a394a80: fb fb fb fb fb fb fb fb fb fb fb fb
> fb fb fb fb
> [ 106.369948] ffff00000a394b00: fb fb fb fb fb fb fb fb fb fb fb fb
> fb fb fb fb

If you're interested in the kref patch it may also clean it up a bit. I
left the mutex in place, but otherwise now the refcount is an atomic
variable, so mmc_blk_put() can happen in parallel to mmc_blk_get() and
the release part can run later. This lets mmc_blk_get() start failing
earlier and returning NULL when the last user has called mmc_blk_put()
but the disk is still hanging around. No idea if this works though.

----8<----
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index d666e24fbe0e..8a300cc2c8be 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -27,6 +27,7 @@
#include <linux/errno.h>
#include <linux/hdreg.h>
#include <linux/kdev_t.h>
+#include <linux/kref.h>
#include <linux/blkdev.h>
#include <linux/cdev.h>
#include <linux/mutex.h>
@@ -110,7 +111,7 @@ struct mmc_blk_data {
#define MMC_BLK_CMD23 (1 << 0) /* Can do SET_BLOCK_COUNT for multiblock */
#define MMC_BLK_REL_WR (1 << 1) /* MMC Reliable write support */

- unsigned int usage;
+ struct kref kref;
unsigned int read_only;
unsigned int part_type;
unsigned int reset_done;
@@ -180,10 +181,8 @@ static struct mmc_blk_data *mmc_blk_get(struct
gendisk *disk)

mutex_lock(&open_lock);
md = disk->private_data;
- if (md && md->usage == 0)
+ if (md && !kref_get_unless_zero(&md->kref))
md = NULL;
- if (md)
- md->usage++;
mutex_unlock(&open_lock);

return md;
@@ -195,20 +194,26 @@ static inline int mmc_get_devidx(struct gendisk *disk)
return devidx;
}

-static void mmc_blk_put(struct mmc_blk_data *md)
+static void mmc_blk_kref_release(struct kref *ref)
{
+ struct mmc_blk_data *md = container_of(ref, struct mmc_blk_data, kref);
+ int devidx;
+
mutex_lock(&open_lock);
- md->usage--;
- if (md->usage == 0) {
- int devidx = mmc_get_devidx(md->disk);
- blk_put_queue(md->queue.queue);
- ida_simple_remove(&mmc_blk_ida, devidx);
- put_disk(md->disk);
- kfree(md);
- }
+ devidx = mmc_get_devidx(md->disk);
+ blk_put_queue(md->queue.queue);
+ ida_simple_remove(&mmc_blk_ida, devidx);
+ md->disk->private_data = NULL;
+ put_disk(md->disk);
+ kfree(md);
mutex_unlock(&open_lock);
}

+static void mmc_blk_put(struct mmc_blk_data *md)
+{
+ kref_put(&md->kref, mmc_blk_kref_release);
+}
+
static ssize_t power_ro_lock_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -2300,7 +2305,7 @@ static struct mmc_blk_data
*mmc_blk_alloc_req(struct mmc_card *card,

INIT_LIST_HEAD(&md->part);
INIT_LIST_HEAD(&md->rpmbs);
- md->usage = 1;
+ kref_init(&md->kref);

ret = mmc_init_queue(&md->queue, card);
if (ret)