Re: [PATCH v10 3/6] fs/proc/task_mmu: Implement IOCTL to get and/or the clear info about PTEs

From: Muhammad Usama Anjum
Date: Fri Feb 17 2023 - 05:40:24 EST


On 2/16/23 2:12 AM, Peter Xu wrote:
> On Wed, Feb 15, 2023 at 03:03:09PM +0500, Muhammad Usama Anjum wrote:
>> On 2/15/23 1:59 AM, Peter Xu wrote:
>> [..]
>>>>>> static inline bool is_pte_written(pte_t pte)
>>>>>> {
>>>>>> if ((pte_present(pte) && pte_uffd_wp(pte)) ||
>>>>>> (pte_swp_uffd_wp_any(pte)))
>>>>>> return false;
>>>>>> return (pte_present(pte) || is_swap_pte(pte));
>>>>>> }
>>>>>
>>>>> Could you explain why you don't want to return dirty for !present? A page
>>>>> can be written then swapped out. Don't you want to know that happened
>>>>> (from dirty tracking POV)?
>>>>>
>>>>> The code looks weird to me too.. We only have three types of ptes: (1)
>>>>> present, (2) swap, (3) none.
>>>>>
>>>>> Then, "(pte_present() || is_swap_pte())" is the same as !pte_none(). Is
>>>>> that what you're really looking for?
>>>> Yes, this is what I've been trying to do. I'll use !pte_none() to make it
>>>> simpler.
>>>
>>> Ah I think I see what you wanted to do now.. But I'm afraid it won't work
>>> for all cases.
>>>
>>> So IIUC the problem is anon pte can be empty, but since uffd-wp bit doesn't
>>> persist on anon (but none) ptes, then we got it lost and we cannot identify
>>> it from pages being written. Your solution will solve problem for
>>> anonymous, but I think it'll break file memories.
>>>
>>> Example:
>>>
>>> Consider one shmem page that got mapped, write protected (using UFFDIO_WP
>>> ioctl), written again (removing uffd-wp bit automatically), then zapped.
>>> The pte will be pte_none() but it's actually written, afaiu.
>>>
>>> Maybe it's time we should introduce UFFD_FEATURE_WP_ZEROPAGE, so we'll need
>>> to install pte markers for anonymous too (then it will work similarly like
>>> shmem/hugetlbfs, that we'll report writting to zero pages), then you'll
>>> need to have the new UFFD_FEATURE_WP_ASYNC depend on it. With that I think
>>> you can keep using the old check and it should start to work.
>>>
>>> Please let me know if my understanding is correct above.
>> Thank you for identifying it. Your understanding seems on point. I'll have
>> research things up about PTE Markers. I'm looking at your patches about it
>> [1]. Can you refer me to "mm alignment sessions" discussion in form of
>> presentation or if any transcript is available?
>
> No worry now, after a second thought I think zero page is better than pte
> markers, and I've got a patch that works for it here by injecting zero
> pages for anonymous:
>
> https://lore.kernel.org/all/20230215210257.224243-1-peterx@xxxxxxxxxx/
>
> I think we'd also better to enforce your new WP_ASYNC feature bit to depend
> on this one, so fail the UFFDIO_API if WP_ASYNC && !WP_ZEROPAGE.
>
> Could you please try by rebasing your work upon this one? Hope it'll work
> for you already. Note again that you'll need to go back to the old
> is_pte|pmd_written() to make things work always, I think.
Thank you so much for sending the ZEROPAGE patch. I've rebased my patches
on top of it and my all tests for anon memory are passing. Now we don't
need to touch the page before engaging wp. This is what we wanted to
achieve. So !wp flag can be easily translated to soft-dirty flag
(is_pte_soft_dirty = is_pte_wp).

I've only a few file mem and shmem tests. I'll write more tests.

>
> [...]
>
>> I truly understand how you feel about export_prev_to_out(). It is really
>> difficult to understand. Even I had to made a hard try to come up with the
>> current code to avoid consuming a lot of kernel's memory while giving user
>> the compact output. I can surely map both of these with a dirty looking
>> macro. But I'm unable to find a decent macro to replace these. I think I'll
>> put a comment some where to explain whats going-on.
>
> So maybe I still missed something? I'll read the new version when it comes.
Lets reconvene in next patches if you feel like they can be improved.

>
> Thanks,
>

--
BR,
Muhammad Usama Anjum