On Thu, 2009-02-19 at 22:02 +0100, Thomas Hellstrom wrote:Yes. my mistake. set_tsk_need_resched() would be the proper call. If I'm correctly informed, that would kick in the scheduler _after_ the mmap_sem() is released, just before returning to user-space.
It looks to me like the driver preferred locking order is
object_mutex (which happens to be the device global struct_mutex)
mmap_sem
offset_mutex.
So if one could avoid using the struct_mutex for object bookkeeping (A separate lock) then
vm_open() and vm_close() would adhere to that locking order as well, simply by not taking the struct_mutex at all.
So only fault() remains, in which that locking order is reversed. Personally I think the trylock ->reschedule->retry method with proper commenting is a good solution. It will be the _only_ place where locking order is reversed and it is done in a deadlock-safe manner. Note that fault() doesn't really fail, but requests a retry from user-space with rescheduling to give the process holding the struct_mutex time to release it.
It doesn't do the reschedule -- need_resched() will check if the current
task was marked to be scheduled away,
furthermore yield based lockingYes, but AFAICT in this situation it is the only way to reverse locking order in a deadlock safe manner. If there is a lot of contention it will eat cpu. Unfortunately since the struct_mutex is such a wide lock there will probably be contention in some situations.
sucks chunks.
What's so very difficult about pulling the copy_*_user() out from underGiven Eric's comment this is a GEM performance-critical path, so even from a CPU-usage perspecive, the trylock solution may be preferred.
the locks?