Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2
From: John Hubbard
Date: Wed Mar 21 2018 - 18:16:12 EST
On 03/21/2018 11:03 AM, Jerome Glisse wrote:
> On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote:
>> On 03/19/2018 07:00 PM, jglisse@xxxxxxxxxx wrote:
>>> From: Ralph Campbell <rcampbell@xxxxxxxxxx>
<snip>
>> Hi Jerome,
>>
>> This presents a deadlock problem (details below). As for solution ideas,
>> Mark Hairgrove points out that the MMU notifiers had to solve the
>> same sort of problem, and part of the solution involves "avoid
>> holding locks when issuing these callbacks". That's not an entire
>> solution description, of course, but it seems like a good start.
>>
>> Anyway, for the deadlock problem:
>>
>> Each of these ->release callbacks potentially has to wait for the
>> hmm_invalidate_range() callbacks to finish. That is not shown in any
>> code directly, but it's because: when a device driver is processing
>> the above ->release callback, it has to allow any in-progress operations
>> to finish up (as specified clearly in your comment documentation above).
>>
>> Some of those operations will invariably need to do things that result
>> in page invalidations, thus triggering the hmm_invalidate_range() callback.
>> Then, the hmm_invalidate_range() callback tries to acquire the same
>> hmm->mirrors_sem lock, thus leading to deadlock:
>>
>> hmm_invalidate_range():
>> // ...
>> down_read(&hmm->mirrors_sem);
>> list_for_each_entry(mirror, &hmm->mirrors, list)
>> mirror->ops->sync_cpu_device_pagetables(mirror, action,
>> start, end);
>> up_read(&hmm->mirrors_sem);
>
> That is just illegal, the release callback is not allowed to trigger
> invalidation all it does is kill all device's threads and stop device
> page fault from happening. So there is no deadlock issues. I can re-
> inforce the comment some more (see [1] for example on what it should
> be).
That rule is fine, and it is true that the .release callback will not
directly trigger any invalidations. However, the problem is in letting
any *existing* outstanding operations finish up. We have to let
existing operations "drain", in order to meet the requirement that
everything is done when .release returns.
For example, if a device driver thread is in the middle of working through
its fault buffer, it will call migrate_vma(), which will in turn unmap
pages. That will cause an hmm_invalidate_range() callback, which tries
to take hmm->mirrors_sems, and we deadlock.
There's no way to "kill" such a thread while it's in the middle of
migrate_vma(), you have to let it finish up.
>
> Also it is illegal for the sync callback to trigger any mmu_notifier
> callback. I thought this was obvious. The sync callback should only
> update device page table and do _nothing else_. No way to make this
> re-entrant.
That is obvious, yes. I am not trying to say there is any problem with
that rule. It's the "drain outstanding operations during .release",
above, that is the real problem.
thanks,
--
John Hubbard
NVIDIA
>
> For anonymous private memory migrated to device memory it is freed
> shortly after the release callback (see exit_mmap()). For share memory
> you might want to migrate back to regular memory but that will be fine
> as you will not get mmu_notifier callback any more.
>
> So i don't see any deadlock here.
>
> Cheers,
> JÃrÃme
>
> [1] https://cgit.freedesktop.org/~glisse/linux/commit/?h=nouveau-hmm&id=93adb3e6b4f39d5d146b6a8afb4175d37bdd4890
>