Re: [RFC PATCH v1 00/10] guest_memfd: Track amount of memory allocated on inode

From: Ackerley Tng

Date: Thu Feb 26 2026 - 02:19:36 EST

"David Hildenbrand (Arm)" <david@xxxxxxxxxx> writes:

> On 2/25/26 08:31, Ackerley Tng wrote:
>> Ackerley Tng <ackerleytng@xxxxxxxxxx> writes:
>>
>>> "David Hildenbrand (Arm)" <david@xxxxxxxxxx> writes:
>>>
>>>>
>>>> [...snip...]
>>>>
>>>>
>>>> If that avoids having to implement truncation completely ourselves, that might be one
>>>> option we could discuss, yes.
>>>>
>>>> Something like:
>>>>
>>>> diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
>>>> index 7c753148af88..94f8bb81f017 100644
>>>> --- a/Documentation/filesystems/vfs.rst
>>>> +++ b/Documentation/filesystems/vfs.rst
>>>> @@ -764,6 +764,7 @@ cache in your filesystem. The following members are defined:
>>>> sector_t (*bmap)(struct address_space *, sector_t);
>>>> void (*invalidate_folio) (struct folio *, size_t start, size_t len);
>>>> bool (*release_folio)(struct folio *, gfp_t);
>>>> + void (*remove_folio)(struct folio *folio);
>>>> void (*free_folio)(struct folio *);
>>>> ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
>>>> int (*migrate_folio)(struct mapping *, struct folio *dst,
>>>> @@ -922,6 +923,11 @@ cache in your filesystem. The following members are defined:
>>>> its release_folio will need to ensure this. Possibly it can
>>>> clear the uptodate flag if it cannot free private data yet.
>>>>
>>>> +``remove_folio``
>>>> + remove_folio is called just before the folio is removed from the
>>>> + page cache in order to allow the cleanup of properties (e.g.,
>>>> + accounting) that needs the address_space mapping.
>>>> +
>>>> ``free_folio``
>>>> free_folio is called once the folio is no longer visible in the
>>>> page cache in order to allow the cleanup of any private data.
>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>>> index 8b3dd145b25e..f7f6930977a1 100644
>>>> --- a/include/linux/fs.h
>>>> +++ b/include/linux/fs.h
>>>> @@ -422,6 +422,7 @@ struct address_space_operations {
>>>> sector_t (*bmap)(struct address_space *, sector_t);
>>>> void (*invalidate_folio) (struct folio *, size_t offset, size_t len);
>>>> bool (*release_folio)(struct folio *, gfp_t);
>>>> + void (*remove_folio)(struct folio *folio);
>>>> void (*free_folio)(struct folio *folio);
>>>> ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
>>>> /*
>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>> index 6cd7974d4ada..5a810eaacab2 100644
>>>> --- a/mm/filemap.c
>>>> +++ b/mm/filemap.c
>>>> @@ -250,8 +250,14 @@ void filemap_free_folio(struct address_space *mapping, struct folio *folio)
>>>> void filemap_remove_folio(struct folio *folio)
>>>> {
>>>> struct address_space *mapping = folio->mapping;
>>>> + void (*remove_folio)(struct folio *);
>>>>
>>>> BUG_ON(!folio_test_locked(folio));
>>>> +
>>>> + remove_folio = mapping->a_ops->remove_folio;
>>>> + if (unlikely(remove_folio))
>>>> + remove_folio(folio);
>>>> +
>>>> spin_lock(&mapping->host->i_lock);
>>>> xa_lock_irq(&mapping->i_pages);
>>>> __filemap_remove_folio(folio, NULL);
>>>>
>>>
>>> Thanks for this suggestion, I'll try this out and send another revision.
>>>
>>>>
>>>> Ideally we'd perform it under the lock just after clearing folio->mapping, but I guess that
>>>> might be more controversial.
>>>>
>>
>> I'm not sure which lock you were referring to, I hope it's not the
>> inode's i_lock? Why is calling the callback under lock frowned upon?
>
> I meant the two locks: mapping->host->i_lock and mapping->i_pages.
>
> I'd assume new callbacks that might result in holding these precious
> locks longer might be a problem for some people. Well, maybe, maybe not.
>

The extra time (for guest_memfd, and almost no extra time for other
filesystems) is on the truncation path, hopefully that isn't a hot path!

> I guess .free_folio() is called outside the lock because it's assumed to
> possibly do more expensive operations.
>

I thought .free_folio() was called outside of the lock because after the
folio is removed from the filemap, there should be no more inode/filemap
related contention, so any cleanup can definitely be done outside the
inode/filemap locks.

> --
> Cheers,
>
> David