Re: [PATCH kernel] powerpc/mm/iommu: Put pages on process exit
From: Alexey Kardashevskiy
Date: Thu Jul 14 2016 - 21:47:24 EST
On 14/07/16 21:52, Alexey Kardashevskiy wrote:
> On 14/07/16 20:12, Balbir Singh wrote:
>> On Thu, Jul 14, 2016 at 3:16 PM, Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:
>>> At the moment VFIO IOMMU SPAPR v2 driver pins all guest RAM pages when
>>> the userspace starts using VFIO. When the userspace process finishes,
>>> all the pinned pages need to be put; this is done as a part of
>>> the userspace memory context (MM) destruction which happens on
>>> the very last mmdrop().
>>>
>>> This approach has a problem that a MM of the userspace process
>>> may live longer than the userspace process itself as kernel threads
>>> usually execute on a MM of a userspace process which was runnning
>>> on a CPU where the kernel thread was scheduled to. If this happened,
>>> the MM remains referenced until this exact kernel thread wakes up again
>>> and releases the very last reference to the MM, on an idle system this
>>> can take even hours.
>>>
>>> This fixes the issue by moving mm_iommu_cleanup() (the helper which puts
>>> pages) from destroy_context() (called on the last mmdrop()) to
>>> the arch-specific arch_exit_mmap() hook (called on the last mmput()).
>>> mmdrop() decrements the mm->mm_count which is a total reference number;
>>> mmput() decrements the mm->mm_users which is a number of user spaces and
>>> this is actually the counter we want to watch for here.
>>>
>>> Cc: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx>
>>> Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
>>> Cc: Paul Mackerras <paulus@xxxxxxxxx>
>>> Cc: Balbir Singh <bsingharora@xxxxxxxxx>
>>> Cc: Nick Piggin <npiggin@xxxxxxxxx>
>>> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx>
>>> ---
>>> arch/powerpc/include/asm/mmu_context.h | 3 +++
>>> arch/powerpc/mm/mmu_context_book3s64.c | 4 ----
>>> 2 files changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
>>> index 9d2cd0c..24b590d 100644
>>> --- a/arch/powerpc/include/asm/mmu_context.h
>>> +++ b/arch/powerpc/include/asm/mmu_context.h
>>> @@ -138,6 +138,9 @@ static inline void arch_dup_mmap(struct mm_struct *oldmm,
>>>
>>> static inline void arch_exit_mmap(struct mm_struct *mm)
>>> {
>>> +#ifdef CONFIG_SPAPR_TCE_IOMMU
>>> + mm_iommu_cleanup(&mm->context);
>>> +#endif
>>> }
>>>
>>
>> The assumption is that all necessary tear down has happened. Earlier
>> we were doing the teardown
>> on the last mm ref drop, do we have any dependence on that? I don't
>> think so, but just checking
>>
>> Does qemu dotce_iommu_register_pages on exit()/atexit() or do we rely
>> on the exit path to do the teardown?
>
> QEMU does call unregister when a memory region is deleted. Handling it in
> arch_exit_mmap() is for the case when a QEMU process just dies for whatever
> reason.
After a second thought the patch is not correct, before it would actually
put pages when all references to MM were dropped and many of them are hold
by file descriptors, with this patch, pages are put before files are closed
which means that devices will not be reset and doing DMA to memory which is
not pinned but there are TCEs pointign to it.
I will make another patch, with additional mmget()/mmput().
>> Can we please remove the #ifdef CONFIG_SPAPR_TCE_IOMMU here and have a version
>> of mm_iommu_cleanup that handles it
>
>
> That would be a change unrelated to the fix. And I just do not see the
> point in this - mm_iommu_cleanup() is declared in the very same header.
>
>
>> Basically do
>>
>> #ifdef CONFIG_SPAPR_TCE_IOMMU
>> extern void mm_iommu_cleanup(void *ptr)
>> #else
>> static inline mm_iommu_cleanup(void *ptr) {}
>> #endif
>>
>> Otherwise looks good
>>
>> Acked-by: Balbir Singh <bsingharora@xxxxxxxxx>
>
> Thanks!
>
>
--
Alexey