Re: [PATCH 0/3]HTLB mapping for drivers (take 2)

From: Eric B Munson
Date: Wed Aug 19 2009 - 06:35:57 EST


On Wed, Aug 19, 2009 at 11:05 AM, Mel Gorman<mel@xxxxxxxxx> wrote:
> On Wed, Aug 19, 2009 at 05:48:11PM +1200, Alexey Korolev wrote:
>> Hi,
>> >
>> > It sounds like this patch set working towards the same goal as my
>> > MAP_HUGETLB set.  The only difference I see is you allocate huge page
>> > at a time and (if I am understanding the patch) fault the page in
>> > immediately, where MAP_HUGETLB only faults pages as needed.  Does the
>> > MAP_HUGETLB patch set provide the functionality that you need, and if
>> > not, what can be done to provide what you need?
>> >
>>
>> Thanks a lot for willing to help. I'll be much appreciate if you have
>> an interesting idea how HTLB mapping for drivers can be done.
>>
>> It is better to describe use case in order to make it clear what needs
>> to be done.
>> Driver provides mapping of device DMA buffers to user level
>> applications.
>
> Ok, so the buffer is in normal memory. When mmap() is called, the buffer
> is already populated by data DMA'd from the device. That scenario rules out
> calling mmap(MAP_ANONYMOUS|MAP_HUGETLB) because userspace has access to the
> buffer before it is populated by data from the device.
>
> However, it does not rule out mmap(MAP_ANONYMOUS|MAP_HUGETLB) when userspace
> is responsible for populating a buffer for sending to a device. i.e. whether it
> is suitable or not depends on when the buffer is populated and who is doing it.
>
>> User level applications process the data.
>> Device is using a master DMA to send data to the user buffer, buffer
>> size can be >1GB and performance is very important. (So huge pages
>> mapping really makes sense.)
>>
>
> Ok, so the DMA may be faster because you have to do less scatter/gather
> and can DMA in larger chunks and and reading from userspace may be faster
> because there is less translation overhead. Right?
>
>> In addition we have to mention that:
>> 1. It is hard for user to tell how much huge pages needs to be
>>    reserved by the driver.
>
> I think you have this problem either way. If the buffer is allocated and
> populated before mmap(), then the driver is going to have to guess how many
> pages it needs. If the DMA occurs as a result of mmap(), it's easier because
> you know the number of huge pages to be reserved at that point and you have
> the option of falling back to small pages if necessary.
>
>> 2. Devices add constrains on memory regions. For example it needs to
>>    be contiguous with in the physical address space. It is necessary to
>>   have ability to specify special gfp flags.
>
> The contiguity constraints are the same for huge pages. Do you mean there
> are zone restrictions? If so, the hugetlbfs_file_setup() function could be
> extended to specify a GFP mask that is used for the allocation of hugepages
> and associated with the hugetlbfs inode. Right now, there is a htlb_alloc_mask
> mask that is applied to some additional flags so htlb_alloc_mask would be
> the default mask unless otherwise specified.
>
>> 3 The HW needs to access physical memory before the user level
>> software can access it. (Hugetlbfs picks up pages on page fault from
>> pool).
>> It means memory allocation needs to be driven by device driver.
>>
>
> How about;
>
>        o Extend Eric's helper slightly to take a GFP mask that is
>          associated with the inode and used for allocations from
>          outside the hugepage pool
>        o A helper that returns the page at a given offset within
>          a hugetlbfs file for population before the page has been
>          faulted.
>
> I know this is a bit hand-wavy, but it would allow significant sharing
> of the existing code and remove much of the hugetlbfs-awareness from
> your current driver.
>
>> Original idea was: create hugetlbfs file which has common mapping with
>> device file. Allocate memory. Populate page cache of hugetlbfs file
>> with allocated pages.
>> When fault occurs, page will be taken from page cache and then
>> remapped to user space by hugetlbfs.
>>
>> Another possible approach is described here:
>> http://marc.info/?l=linux-mm&m=125065257431410&w=2
>> But currently not sure  will it work or not.
>>
>>
>> Thanks,
>> Alexey
>>
>
> --
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab
>

Alexey,

I'd be willing to take a stab at a prototype of Mel's suggestion based
on my patch set if you this it would be useful to you.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/