[PATCH] staging: android: ashmem: get_name,set_name not to hold ashmem_mutex
From: Shankar Brahadeeswaran
Date: Mon Feb 18 2013 - 00:35:58 EST
I'm working on Android Linux Kernel (version 3.4) and seeing a
"deadlock" in the ashmem driver, while handling mmap request.
I seek your support to fix the same. The locks that involved in the
dead lock are
The following is the sequence of events that leads to the deadlock.
There are two threads A and B that belong to the same process
(system_server) and hence share the mm struct.
A1) In the A's context an mmap system call is made with an fd of ashmem
A2) The system call sys_mmap_pgoff acquires the mmap_sem of the "mm"
and sleeps before calling the .mmap of ashmem i.e before calling
ashmem_mmap and acquiring the ashmem_mutex from this function.
Now the thread B runs and proceeds to do the following
B1) In the B's context ashmem ioctl with option ASHMEM_GET_NAME is called.
B2) Now the code proceeds to acquire the ashmem_mutex and performs a
B3) copy_to_user raises a valid exception to copy the data from user
space and proceeds to handle it gracefully,
do_DataAbort --> do_page_fault (This condition can happen if the
physical page for the give user address is not mapped yet).
B4) In do_page_fault it finds that the mm->mmap_sem is not available
(Note A & B share the mm) since A has it and sleeps
Now the thread A runs again
A3) It proceeds to call ashmem_mmap and tries to acquired
ashmem_mutex, which is not available (is with B) and sleeps.
Now A has acquired mmap_sem and waits for B to release ashmem_mutex
B has acquired ashmem_mutex and waits for the mmap_sem to be
available, which is held by A.
This creates a dead lock in the system.
Do not hold the ashmem lock while making copy_to_user/copy_from_user calls.
At the same ensure that data integrity is maintained (two threads
calling SET_NAME/GET_NAME should not lead to issues).
I have attached a patch to fix this problem.
How to Reproduce the problem:
In normal circumstances it is very hard to see this. The problem was
seen during regression tests, but was very rare.
My Kernel Version is 3.4.0 and I managed to write a unit test case to
create this issue almost 100% of the time.
Used the same unit test case to ensure that the attached patch fixes
the deadlock in Kernel Version 3.4
Ported the same patch to 3.8 for submission.
Please provide your feedback on the patch.
Description: Binary data