Re: [PATCH 4/6] hugetlbfs: implement memfd sealing

From: David Herrmann
Date: Sun Nov 05 2017 - 07:07:41 EST


Hi

On Sat, Nov 4, 2017 at 12:31 AM, Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
> On 11/03/2017 10:56 AM, Mike Kravetz wrote:
>> On 11/03/2017 10:41 AM, David Herrmann wrote:
>>> Hi
>>>
>>> On Fri, Nov 3, 2017 at 6:12 PM, Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>>>> On 11/03/2017 10:03 AM, David Herrmann wrote:
>>>>> Hi
>>>>>
>>>>> On Tue, Oct 31, 2017 at 7:40 PM, Marc-Andrà Lureau
>>>>> <marcandre.lureau@xxxxxxxxxx> wrote:
>>>>>> Implements memfd sealing, similar to shmem:
>>>>>> - WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in
>>>>>> memfd_add_seals(). write() doesn't exist for hugetlbfs.
>>>>>> - SHRINK: added similar check as shmem_setattr()
>>>>>> - GROW: added similar check as shmem_setattr() & shmem_fallocate()
>>>>>>
>>>>>> Except write() operation that doesn't exist with hugetlbfs, that
>>>>>> should make sealing as close as it can be to shmem support.
>>>>>
>>>>> SEAL, SHRINK, and GROW look fine to me.
>>>>>
>>>>> Regarding WRITE
>>>>
>>>> The commit message may not be clear. However, hugetlbfs does not support
>>>> the write system call (or aio). The only way to modify contents of a
>>>> hugetlbfs file is via mmap or hole punch/truncate. So, we do not really
>>>> need to worry about those special (a)io cases for hugetlbfs.
>>>
>>> This is not about the write(2) syscall. Please consider this scenario
>>> about shmem:
>>>
>>> You create a memfd via memfd_create() and map it writable. You now
>>> call another kernel syscall that takes as input _any mapped page
>>> range_. You pass your mapped memfd-addresses to it. Those syscalls
>>> tend to use get_user_pages() to pin arbitrary user-mapped pages, as
>>> such this also affects shmem. In this case, those pages might stay
>>> mapped even if you munmap() your memfd!
>>>
>>> One example of this is using AIO-read() on any other file that
>>> supports it, passing your mapped memfd as buffer to _read into_. The
>>> operations supported on the memfd are irrelevant here.
>>> The selftests contain a FUSE-based test for this, since FUSE allows
>>> user-space to GUP pages for an arbitrary amount of time.
>>>
>>> The original fix for this is:
>>>
>>> commit 05f65b5c70909ef686f865f0a85406d74d75f70f
>>> Author: David Herrmann <dh.herrmann@xxxxxxxxx>
>>> Date: Fri Aug 8 14:25:36 2014 -0700
>>>
>>> shm: wait for pins to be released when sealing
>>>
>>> Please have a look at this. Your patches use shmem_add_seals() almost
>>> unchanged, and as such you call into shmem_wait_for_pins() on
>>> hugetlbfs. I would really like to see an explicit ACK that this works
>>> on hugetlbfs.
>>
>> Thanks for the explanation. I missed that in your first reply. I'll
>> look into this for hugetlbfs.
>
> I reviewed the routines in the above commit and did not see anything that
> would prevent them from working properly with hugetlbfs. I modified the
> fuse test to use hugetlbfs based mapping. I also instrumented the above
> routines and verified that tags were set/checked/cleared as designed for
> hugetlb pages. So, that is an ACK on working with hugetlbfs.
>
> This does bring up the point that the fuse seals test should also be
> modified to work with hugetlbfs as part of this series.

Perfect! Looks all good to me then!

Thanks
David