Re: [PATCH] hugetlbfs: add O_TMPFILE support

From: Mike Kravetz
Date: Tue Oct 22 2019 - 22:58:19 EST


On 10/22/19 12:09 AM, Piotr Sarna wrote:
> On 10/21/19 7:17 PM, Mike Kravetz wrote:
>> On 10/15/19 4:37 PM, Mike Kravetz wrote:
>>> On 10/15/19 3:50 AM, Michal Hocko wrote:
>>>> On Tue 15-10-19 11:01:12, Piotr Sarna wrote:
>>>>> With hugetlbfs, a common pattern for mapping anonymous huge pages
>>>>> is to create a temporary file first.
>>>>
>>>> Really? I though that this is normally done by shmget(SHM_HUGETLB) or
>>>> mmap(MAP_HUGETLB). Or maybe I misunderstood your definition on anonymous
>>>> huge pages.
>>>>
>>>>> Currently libraries like
>>>>> libhugetlbfs and seastar create these with a standard mkstemp+unlink
>>>>> trick,
>>>
>>> I would guess that much of libhugetlbfs was writen before MAP_HUGETLB
>>> was implemented. So, that is why it does not make (more) use of that
>>> option.
>>>
>>> The implementation looks to be straight forward. However, I really do
>>> not want to add more functionality to hugetlbfs unless there is specific
>>> use case that needs it.
>>
>> It was not my intention to shut down discussion on this patch. I was just
>> asking if there was a (new) use case for such a change. I am checking with
>> our DB team as I seem to remember them using the create/unlink approach for
>> hugetlbfs in one of their upcoming models.
>>
>> Is there a new use case you were thinking about?
>>
>
> Oh, I indeed thought it was a shutdown. The use case I was thinking about was in Seastar, where the create+unlink trick is used for creating temporary files (in a generic way, not only for hugetlbfs). I simply intended to migrate it to a newer approach - O_TMPFILE. However,
> for the specific case of hugetlbfs it indeed makes more sense to skip it and use mmap's MAP_HUGETLB, so perhaps it's not worth it to patch a perfectly good and stable file system just to provide a semi-useful flag support. My implementation of tmpfile for hugetlbfs is straightforward indeed, but the MAP_HUGETLB argument made me realize that it may not be worth the trouble - especially that MAP_HUGETLB is here since 2.6 and O_TMPFILE was introduced around v3.11, so the mmap way looks more portable.
>
> tldr: I'd be very happy to get my patch accepted, but the use case I had in mind can be easily solved with MAP_HUGETLB, so I don't insist.

If you really are after something like 'anonymous memory' for Seastar,
then MAP_HUGETLB would be the better approach.

I'm still checking with Oracle DB team as they may have a use for O_TMPFILE
in an upcoming release. In their use case, they want an open fd to work with.
If it looks like they will proceed in this direction, we can work to get
your patch moved forward.

Thanks,
--
Mike Kravetz