Re: [RFC][PATCH v2 1/5] mm: Avoid unmapping pinned pages
From: David Hildenbrand
Date: Fri Jan 21 2022 - 03:22:14 EST
On 21.01.22 08:51, Peter Zijlstra wrote:
> On Thu, Jan 20, 2022 at 07:25:08PM +0100, David Hildenbrand wrote:
>> On 20.01.22 16:55, Peter Zijlstra wrote:
>>> Add a guarantee for Anon pages that pin_user_page*() ensures the
>>> user-mapping of these pages stay preserved. In order to ensure this
>>> all rmap users have been audited:
>>>
>>> vmscan: already fails eviction due to page_maybe_dma_pinned()
>>>
>>> migrate: migration will fail on pinned pages due to
>>> expected_page_refs() not matching, however that is
>>> *after* try_to_migrate() has already destroyed the
>>> user mapping of these pages. Add an early exit for
>>> this case.
>>>
>>> numa-balance: as per the above, pinned pages cannot be migrated,
>>> however numa balancing scanning will happily PROT_NONE
>>> them to get usage information on these pages. Avoid
>>> this for pinned pages.
>>
>> page_maybe_dma_pinned() can race with GUP-fast without
>> mm->write_protect_seq. This is a real problem for vmscan() with
>> concurrent GUP-fast as it can result in R/O mappings of pinned pages and
>> GUP will lose synchronicity to the page table on write faults due to
>> wrong COW.
>
> Urgh, so yeah, that might be a problem. Follow up code uses it like
> this:
>
> +/*
> + * Pinning a page inhibits rmap based unmap for Anon pages. Doing a load
> + * through the user mapping ensures the user mapping exists.
> + */
> +#define umcg_pin_and_load(_self, _pagep, _member) \
> +({ \
> + __label__ __out; \
> + int __ret = -EFAULT; \
> + \
> + if (pin_user_pages_fast((unsigned long)(_self), 1, 0, &(_pagep)) != 1) \
> + goto __out; \
> + \
> + if (!PageAnon(_pagep) || \
> + get_user(_member, &(_self)->_member)) { \
> + unpin_user_page(_pagep); \
> + goto __out; \
> + } \
> + __ret = 0; \
> +__out: __ret; \
> +})
>
> And after that hard assumes (on the penalty of SIGKILL) that direct user
> access works. Specifically it does RmW ops on it. So I suppose I'd
> better upgrade that load to a RmW at the very least.
>
> But is that sufficient? Let me go find that race you mention...
>
It's described in [1] under point 3.
After we put the page into the swapcache, it's still mapped into the
page tables, where GUP can find it. Only after that, we try to unmap the
page (placing swap entries). So it's racy.
Note also point 2. in [1], which is related to O_DIRECT that does
currently not yet use FOLL_PIN but uses FOLL_GET.
[1]
https://lore.kernel.org/r/3ae33b08-d9ef-f846-56fb-645e3b9b4c66@xxxxxxxxxx
--
Thanks,
David / dhildenb