Re: [PATCH v2 0/2] uio: Prevent kernel oops on UIO device remove with open fds

From: Greg KH
Date: Mon May 14 2018 - 10:17:24 EST


On Mon, May 14, 2018 at 01:32:21PM +1200, Hamish Martin wrote:
> If a UIO device is removed while a user space app has an open file
> descriptor to that device's /dev/uio* file, a kernel oops can occur when
> the file descriptor is ultimately closed. The oops is triggered by
> dereferencing either the uio_listener struct's 'dev' pointer, or at the
> next level, when dereferencing a stale 'info' pointer on that dev.
>
> To resolve this we now increment the reference count for the uio_device
> and prevent the destruction of the uio_device until all open file
> descriptors are closed.
> A further consequence of resolving the stale pointers to the uio_device
> is that it exposes an issue with the independent life cycles of the
> uio_device and its related uio_info. The uio_info struct is allocated by
> the user of the uio subsystem and connected to a uio_device at
> registration time (see __uio_register_device()). When the device
> corresponding to that uio_info is unregistered and the memory for the
> uio_info is freed, the uio_device is left with a stale pointer which may
> still be used in the file system ops (uio_poll(), uio_read(), etc)
> To resolve this second issue, we now lock alteration or access of the
> 'info' member of the uio_device struct and alter the accessing code to
> handle that pointer being null.
>
> This patch series contains two patches. The first is a cosmetic change
> to the return paths from uio_write to facilitate easier review of the
> second patch. The second patch contains the change to prevent destruction
> of the uio_device while file descriptors to it remain open and the
> additional locking to prevent the use of a stale 'info' pointer.
>
> This patch series is heavily based on the work done by Brian Russell
> (formerly of Brocade) in May 2015. His last version of an attempt to fix
> this is seen here:
> https://patchwork.kernel.org/patch/6057431/
> My new addition is the locking around use of the info pointer. It is my
> proposed solution to Brian's last comment about what he had left
> unfinished:
> "It needs a bit more work. uio_info needs to live as long as the
> corresponding uio_device. Since they seem to always be 1:1,
> uio_info could embedded within uio_device (but then all the users
> of uio need changed) or uio_info could be a refcounted object."
>
> For further info here is an example of the kernel oops this patch series
> is trying to resolve. Output is from a 4.17.0-rc1 kernel with a test app
> opening a uio device and doing select with the fd, then removing the UIO
> device while the select is still happening:
>
> Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d63
> Faulting instruction address: 0xc000000000605c98
> Oops: Kernel access of bad area, sig: 11 [#1]
> BE PREEMPT SMP NR_CPUS=8 CoreNet Generic
> Modules linked in:
> CPU: 6 PID: 282 Comm: uio_tester Not tainted 4.17.0-rc1-at1+ #8
> NIP: c000000000605c98 LR: c000000000211d8c CTR: c000000000605c60
> REGS: c0000000f02f3480 TRAP: 0300 Not tainted (4.17.0-rc1-at1+)
> MSR: 000000008202b000 <VEC,CE,EE,FP,ME> CR: 24284448 XER: 20000000
> DEAR: 6b6b6b6b6b6b6d63 ESR: 0000000000000000 SOFTE: 0
> GPR00: c000000000211d8c c0000000f02f3700 c000000000dc7d00 c0000000ef365bc0
> GPR04: c0000000f02f3800 0000000000000000 c0000000ef4b9b58 0000000000000006
> GPR08: c000000000605c60 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000016
> GPR12: 0000000042284448 c00000003fffd440 0000000000000004 0000000000000003
> GPR16: 0000000000000008 0000000000000000 00000000ef365bc0 0000000000000000
> GPR20: c0000000f02f3c00 0000000000000000 0000000000000000 c0000000ef365bc0
> GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR28: c0000000f02f3800 c0000000ef365bc0 c0000000f02c2c68 c0000000efd01d20
> NIP [c000000000605c98] .uio_poll+0x38/0xe0
> LR [c000000000211d8c] .do_select+0x39c/0x7a0
> Call Trace:
> [c0000000f02f3700] [c0000000f02f3790] 0xc0000000f02f3790 (unreliable)
> [c0000000f02f3790] [c000000000211d8c] .do_select+0x39c/0x7a0
> [c0000000f02f3b90] [c000000000212eac] .core_sys_select+0x22c/0x320
> [c0000000f02f3d70] [c000000000213098] .__se_sys_select+0xf8/0x170
> [c0000000f02f3e30] [c000000000000674] system_call+0x58/0x64
> Instruction dump:
> f8010010 fba1ffe8 fbc1fff0 fbe1fff8 f821ff71 7c7d1b78 7c9c2378 60000000
> 60000000 ebdd00c0 ebfe0000 e93f0038 <e92901f8> 2fa90000 40de0030 3c60ffff
> ---[ end trace 8badf75b83f45856 ]---


Very nice work, thanks for doing this. I've now queued it up.

greg k-h