Re: [PATCH] io_uring: free allocated io_memory once

From: Jens Axboe
Date: Tue Apr 30 2019 - 12:22:36 EST


On 4/30/19 10:20 AM, Mark Rutland wrote:
> If io_allocate_scq_urings() fails to allocate an sq_* region, it will
> call io_mem_free() for any previously allocated regions, but leave
> dangling pointers to these regions in the ctx. Any regions which have
> not yet been allocated are left NULL. Note that when returning
> -EOVERFLOW, the previously allocated sq_ring is not freed, which appears
> to be an unintentional leak.
>
> When io_allocate_scq_urings() fails, io_uring_create() will call
> io_ring_ctx_wait_and_kill(), which calls io_mem_free() on all the sq_*
> regions, assuming the pointers are valid and not NULL.
>
> This can result in pages being freed multiple times, which has been
> observed to corrupt the page state, leading to subsequent fun. This can
> also result in virt_to_page() on NULL, resulting in the use of bogus
> page addresses, and yet more subsequent fun. The latter can be detected
> with CONFIG_DEBUG_VIRTUAL on arm64.
>
> Adding a cleanup path to io_allocate_scq_urings() complicates the logic,
> so let's leave it to io_ring_ctx_free() to consistently free these
> pointers, and simplify the io_allocate_scq_urings() error paths.
>
> Full splats from before this patch below. Note that the pointer logged
> by the DEBUG_VIRTUAL "non-linear address" warning has been hashed, and
> is actually NULL.
>
> [ 26.098129] page:ffff80000e949a00 count:0 mapcount:-128 mapping:0000000000000000 index:0x0
> [ 26.102976] flags: 0x63fffc000000()
> [ 26.104373] raw: 000063fffc000000 ffff80000e86c188 ffff80000ea3df08 0000000000000000
> [ 26.108917] raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
> [ 26.137235] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
> [ 26.143960] ------------[ cut here ]------------
> [ 26.146020] kernel BUG at include/linux/mm.h:547!
> [ 26.147586] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [ 26.149163] Modules linked in:
> [ 26.150287] Process syz-executor.21 (pid: 20204, stack limit = 0x000000000e9cefeb)
> [ 26.153307] CPU: 2 PID: 20204 Comm: syz-executor.21 Not tainted 5.1.0-rc7-00004-g7d30b2ea43d6 #18
> [ 26.156566] Hardware name: linux,dummy-virt (DT)
> [ 26.158089] pstate: 40400005 (nZcv daif +PAN -UAO)
> [ 26.159869] pc : io_mem_free+0x9c/0xa8
> [ 26.161436] lr : io_mem_free+0x9c/0xa8
> [ 26.162720] sp : ffff000013003d60
> [ 26.164048] x29: ffff000013003d60 x28: ffff800025048040
> [ 26.165804] x27: 0000000000000000 x26: ffff800025048040
> [ 26.167352] x25: 00000000000000c0 x24: ffff0000112c2820
> [ 26.169682] x23: 0000000000000000 x22: 0000000020000080
> [ 26.171899] x21: ffff80002143b418 x20: ffff80002143b400
> [ 26.174236] x19: ffff80002143b280 x18: 0000000000000000
> [ 26.176607] x17: 0000000000000000 x16: 0000000000000000
> [ 26.178997] x15: 0000000000000000 x14: 0000000000000000
> [ 26.181508] x13: 00009178a5e077b2 x12: 0000000000000001
> [ 26.183863] x11: 0000000000000000 x10: 0000000000000980
> [ 26.186437] x9 : ffff000013003a80 x8 : ffff800025048a20
> [ 26.189006] x7 : ffff8000250481c0 x6 : ffff80002ffe9118
> [ 26.191359] x5 : ffff80002ffe9118 x4 : 0000000000000000
> [ 26.193863] x3 : ffff80002ffefe98 x2 : 44c06ddd107d1f00
> [ 26.196642] x1 : 0000000000000000 x0 : 000000000000003e
> [ 26.198892] Call trace:
> [ 26.199893] io_mem_free+0x9c/0xa8
> [ 26.201155] io_ring_ctx_wait_and_kill+0xec/0x180
> [ 26.202688] io_uring_setup+0x6c4/0x6f0
> [ 26.204091] __arm64_sys_io_uring_setup+0x18/0x20
> [ 26.205576] el0_svc_common.constprop.0+0x7c/0xe8
> [ 26.207186] el0_svc_handler+0x28/0x78
> [ 26.208389] el0_svc+0x8/0xc
> [ 26.209408] Code: aa0203e0 d0006861 9133a021 97fcdc3c (d4210000)
> [ 26.211995] ---[ end trace bdb81cd43a21e50d ]---
>
> [ 81.770626] ------------[ cut here ]------------
> [ 81.825015] virt_to_phys used for non-linear address: 000000000d42f2c7 ( (null))
> [ 81.827860] WARNING: CPU: 1 PID: 30171 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x48/0x68
> [ 81.831202] Modules linked in:
> [ 81.832212] CPU: 1 PID: 30171 Comm: syz-executor.20 Not tainted 5.1.0-rc7-00004-g7d30b2ea43d6 #19
> [ 81.835616] Hardware name: linux,dummy-virt (DT)
> [ 81.836863] pstate: 60400005 (nZCv daif +PAN -UAO)
> [ 81.838727] pc : __virt_to_phys+0x48/0x68
> [ 81.840572] lr : __virt_to_phys+0x48/0x68
> [ 81.842264] sp : ffff80002cf67c70
> [ 81.843858] x29: ffff80002cf67c70 x28: ffff800014358e18
> [ 81.846463] x27: 0000000000000000 x26: 0000000020000080
> [ 81.849148] x25: 0000000000000000 x24: ffff80001bb01f40
> [ 81.851986] x23: ffff200011db06c8 x22: ffff2000127e3c60
> [ 81.854351] x21: ffff800014358cc0 x20: ffff800014358d98
> [ 81.856711] x19: 0000000000000000 x18: 0000000000000000
> [ 81.859132] x17: 0000000000000000 x16: 0000000000000000
> [ 81.861586] x15: 0000000000000000 x14: 0000000000000000
> [ 81.863905] x13: 0000000000000000 x12: ffff1000037603e9
> [ 81.866226] x11: 1ffff000037603e8 x10: 0000000000000980
> [ 81.868776] x9 : ffff80002cf67840 x8 : ffff80001bb02920
> [ 81.873272] x7 : ffff1000037603e9 x6 : ffff80001bb01f47
> [ 81.875266] x5 : ffff1000037603e9 x4 : dfff200000000000
> [ 81.876875] x3 : ffff200010087528 x2 : ffff1000059ecf58
> [ 81.878751] x1 : 44c06ddd107d1f00 x0 : 0000000000000000
> [ 81.880453] Call trace:
> [ 81.881164] __virt_to_phys+0x48/0x68
> [ 81.882919] io_mem_free+0x18/0x110
> [ 81.886585] io_ring_ctx_wait_and_kill+0x13c/0x1f0
> [ 81.891212] io_uring_setup+0xa60/0xad0
> [ 81.892881] __arm64_sys_io_uring_setup+0x2c/0x38
> [ 81.894398] el0_svc_common.constprop.0+0xac/0x150
> [ 81.896306] el0_svc_handler+0x34/0x88
> [ 81.897744] el0_svc+0x8/0xc
> [ 81.898715] ---[ end trace b4a703802243cbba ]---
>
> Fixes: 2b188cc1bb857a9d ("Add io_uring IO interface")
> Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Jens Axboe <axboe@xxxxxxxxx>
> Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: linux-block@xxxxxxxxxxxxxxx
> Cc: linux-fsdevel@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxx
> ---
> fs/io_uring.c | 18 ++++++++----------
> 1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 25fc8cb56fc5..5228e9b41708 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2552,9 +2552,12 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx)
> sock_release(ctx->ring_sock);
> #endif
>
> - io_mem_free(ctx->sq_ring);
> - io_mem_free(ctx->sq_sqes);
> - io_mem_free(ctx->cq_ring);
> + if (ctx->sq_ring)
> + io_mem_free(ctx->sq_ring);
> + if (ctx->sq_sqes)
> + io_mem_free(ctx->sq_sqes);
> + if (ctx->cq_ring)
> + io_mem_free(ctx->cq_ring);

Please just make io_mem_free() handle a NULL pointer so we don't need
these checks.

--
Jens Axboe