Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown()

From: Oded Gabbay
Date: Thu Mar 17 2016 - 11:39:32 EST


On Thu, Mar 17, 2016 at 4:37 PM, Jerome Glisse <j.glisse@xxxxxxxxx> wrote:
> On Wed, Mar 16, 2016 at 05:10:34PM +0000, Olu Ogunbowale wrote:
>> From: Olujide Ogunbowale <Olu.Ogunbowale@xxxxxxxxxx>
>>
>> Export the memory management functions, unmapped_area() &
>> unmapped_area_topdown(), as GPL symbols; this allows the kernel to
>> better support process address space mirroring on both CPU and device
>> for out-of-tree drivers by allowing the use of vm_unmapped_area() in a
>> driver's file operation get_unmapped_area().
>>
>> This is required by drivers that want to control or limit a process VMA
>> range into which shared-virtual-memory (SVM) buffers are mapped during
>> an mmap() call in order to ensure that said SVM VMA does not collide
>> with any pre-existing VMAs used by non-buffer regions on the device
>> because SVM buffers must have identical VMAs on both CPU and device.
>>
>> Exporting these functions is particularly useful for graphics devices as
>> SVM support is required by the OpenCL & HSA specifications and also SVM
>> support for 64-bit CPUs where the useable device SVM address range
>> is/maybe a subset of the full 64-bit range of the CPU. Exporting also
>> avoids the need to duplicate the VMA search code in such drivers.
>
> What other driver do for non-buffer region is have the userspace side
> of the device driver mmap the device driver file and use vma range you
> get from that for those non-buffer region. On cpu access you can either
> chose to fault or to return a dummy page. With that trick no need to
> change kernel.
>
> Note that i do not see how you can solve the issue of your GPU having
> less bits then the cpu. For instance, lets assume that you have 46bits
> for the GPU while the CPU have 48bits. Now an application start and do
> bunch of allocation that end up above (1 << 46), then same application
> load your driver and start using some API that allow to transparently
> use previously allocated memory -> fails.
>
> Unless you are in scheme were all allocation must go through some
> special allocator but i thought this was not the case for HSA. I know
> lower level of OpenCL allows that.
>
> Cheers,
> JÃrÃme

In amdkfd (AMD HSA kernel driver), for APU's where the CPU and GPU sit
on the same die, we don't need this as the GPU cores use the AMD IOMMU
(v2) to access the system memory. i.e. we don't need to use vram (gpu
memory) at all and we don't need to mirror address spaces.

For dGPU, it's a different story. On GPUs where there is only 40-bit
memory space, for example, GCN 1.0 and 1.1, I would assume a pass
through a special allocator is a must, while memory addresses below
the 40-bit limit will need to be reserved for HSA. Note that amdkfd
doesn't support dGPU at this time.

Thanks,
Oded