Re: [PATCH RFC 00/30] userfaultfd-wp: Support shmem and hugetlbfs

From: Mike Kravetz
Date: Tue Feb 09 2021 - 20:21:58 EST


On 2/5/21 6:36 PM, Peter Xu wrote:
> On Fri, Feb 05, 2021 at 01:53:34PM -0800, Mike Kravetz wrote:
>> On 1/29/21 2:49 PM, Peter Xu wrote:
>>> On Fri, Jan 15, 2021 at 12:08:37PM -0500, Peter Xu wrote:
>>>> This is a RFC series to support userfaultfd upon shmem and hugetlbfs.
>> ...
>>> Huge & Mike,
>>>
>>> Would any of you have comment/concerns on the high-level design of this series?
>>>
>>> It would be great to know it, especially major objection, before move on to an
>>> non-rfc version.
>>
>> My apologies for not looking at this sooner. Even now, I have only taken
>> a very brief look at the hugetlbfs patches.
>>
>> Coincidentally, I am working on the 'BUG' that soft dirty does not work for
>> hugetlbfs. As you can imagine, there is some overlap in handling of wp ptes
>> set for soft dirty. In addition, pmd sharing must be disabled for soft dirty
>> as here and in Axel's uffd minor fault code.
>
> Interesting to know that we'll reach and need something common from different
> directions, especially when they all mostly happen at the same time. :)
>
> Is there a real "BUG" that you mentioned? I'd be glad to read about it if
> there is a link or something.
>

Sorry, I was referring to a bugzilla bug not a BUG(). Bottom line is that
hugetlb was mostly overlooked when soft dirty support was added. A thread
mostly from me is at:
lore.kernel.org/r/999775bf-4204-2bec-7c3d-72d81b4fce30@xxxxxxxxxx
I am close to sending out a RFC, but keep getting distracted.

>> No objections to the overall approach based on my quick look.
>
> Thanks for having a look.
>
> So for hugetlb one major thing is indeed about the pmd sharing part, which
> seems that we've got very good consensus on.

Yes

> The other thing that I'd love to get some comment would be a shared topic with
> shmem in that: for a file-backed memory type, uffd-wp needs a consolidated way
> to record wr-protect information even if the pgtable entries were flushed.
> That comes from a fundamental difference between anonymous and file-backed
> memory in that anonymous pages keep all info in the pgtable entry, but
> file-backed memory is not, e.g., pgtable entries can be dropped at any time as
> long as page cache is there.

Sorry, but I can not recall this difference for hugetlb pages. What operations
lead to flushing of pagetable entries? It would need to be something other
than unmap as it seems we want to lose the information in unmap IIUC.

> I goes to look at soft-dirty then regarding this issue, and there's actually a
> paragraph about it:
>
> While in most cases tracking memory changes by #PF-s is more than enough
> there is still a scenario when we can lose soft dirty bits -- a task
> unmaps a previously mapped memory region and then maps a new one at
> exactly the same place. When unmap is called, the kernel internally
> clears PTE values including soft dirty bits. To notify user space
> application about such memory region renewal the kernel always marks
> new memory regions (and expanded regions) as soft dirty.
>
> I feel like it just means soft-dirty currently allows false positives: we could
> have set the soft dirty bit even if the page is clean. And that's what this
> series wanted to avoid: it used the new concept called "swap special pte" to
> persistent that information even for file-backed memory. That all goes for
> avoiding those false positives.

Yes, I have seen this with soft dirty. It really does not seem right. When
you first create a mapping, even before faulting in anything the vma is marked
VM_SOFTDIRTY and from the user's perspective all addresses/pages appear dirty.

To be honest, I am not sure you want to try and carry per-process/per-mapping
wp information in the file. In the comment about soft dirty above, it seems
reasonable that unmapping would clear all soft dirty information. Also,
unmapping would clear any uffd state/information.
--
Mike Kravetz

>>
>> I'll try to take a closer look at the areas where efforts overlap.
>
> I dumped above just to hope maybe it could help a little bit more for the
> reviews, but if it's not, I totally agree we can focus on the overlapped part.
> And, I'd be more than glad to read your work too if I can understand more on
> what you're working on with the hugetlb soft dirty bug, since I do feel uffd-wp
> is servicing similar goals just like what soft-dirty does, so we could share a
> lot of common knowledge there. :)
>
> Thanks again!
>