Re: [PATCH v3 kvm/queue 01/16] mm/shmem: Introduce F_SEAL_INACCESSIBLE

From: David Hildenbrand
Date: Thu Jan 13 2022 - 10:57:06 EST


On 06.01.22 14:06, Chao Peng wrote:
> On Tue, Jan 04, 2022 at 03:22:07PM +0100, David Hildenbrand wrote:
>> On 23.12.21 13:29, Chao Peng wrote:
>>> From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
>>>
>>> Introduce a new seal F_SEAL_INACCESSIBLE indicating the content of
>>> the file is inaccessible from userspace in any possible ways like
>>> read(),write() or mmap() etc.
>>>
>>> It provides semantics required for KVM guest private memory support
>>> that a file descriptor with this seal set is going to be used as the
>>> source of guest memory in confidential computing environments such
>>> as Intel TDX/AMD SEV but may not be accessible from host userspace.
>>>
>>> At this time only shmem implements this seal.
>>>
>>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
>>> Signed-off-by: Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx>
>>> ---
>>> include/uapi/linux/fcntl.h | 1 +
>>> mm/shmem.c | 37 +++++++++++++++++++++++++++++++++++--
>>> 2 files changed, 36 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
>>> index 2f86b2ad6d7e..e2bad051936f 100644
>>> --- a/include/uapi/linux/fcntl.h
>>> +++ b/include/uapi/linux/fcntl.h
>>> @@ -43,6 +43,7 @@
>>> #define F_SEAL_GROW 0x0004 /* prevent file from growing */
>>> #define F_SEAL_WRITE 0x0008 /* prevent writes */
>>> #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
>>> +#define F_SEAL_INACCESSIBLE 0x0020 /* prevent file from accessing */
>>
>> I think this needs more clarification: the file content can still be
>> accessed using in-kernel mechanisms such as MEMFD_OPS for KVM. It
>> effectively disallows traditional access to a file (read/write/mmap)
>> that will result in ordinary MMU access to file content.
>>
>> Not sure how to best clarify that: maybe, prevent ordinary MMU access
>> (e.g., read/write/mmap) to file content?
>
> Or: prevent userspace access (e.g., read/write/mmap) to file content?

The issue with that phrasing is that userspace will be able to access
that content, just via a different mechanism eventually ... e.g., via
the KVM MMU indirectly. If that makes it clearer what I mean :)

>>
>>> /* (1U << 31) is reserved for signed error codes */
>>>
>>> /*
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 18f93c2d68f1..faa7e9b1b9bc 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -1098,6 +1098,10 @@ static int shmem_setattr(struct user_namespace *mnt_userns,
>>> (newsize > oldsize && (info->seals & F_SEAL_GROW)))
>>> return -EPERM;
>>>
>>> + if ((info->seals & F_SEAL_INACCESSIBLE) &&
>>> + (newsize & ~PAGE_MASK))
>>> + return -EINVAL;
>>> +
>>
>> What happens when sealing and there are existing mmaps?
>
> I think this is similar to ftruncate, in either case we just allow that.
> The existing mmaps will be unmapped and KVM will be notified to
> invalidate the mapping in the secondary MMU as well. This assume we
> trust the userspace even though it can not access the file content.

Can't we simply check+forbid instead?

--
Thanks,

David / dhildenb