Re: [PATCH RESEND v15 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs

From: Muhammad Usama Anjum
Date: Wed May 24 2023 - 10:16:35 EST


On 5/24/23 6:55 PM, Peter Xu wrote:
...
>>> What is the steps of the test? Is it as simple as "writeprotect",
>>> "unprotect", then write all pages in a single thread?
>>>
>>> Is UFFDIO_WRITEPROTECT sent in one range covering all pages?
>>>
>>> Maybe you can attach the test program here too.
>>
>> I'd not attached the test earlier as I thought that you wouldn't be
>> interested in running the test. I've attached it now. The test has multiple
>
> Thanks. No plan to run it, just to make sure I understand why such a
> difference.
>
>> threads where one thread tries to get status of flags and reset them, while
>> other threads write to that memory. In main(), we call the pagemap_scan
>> ioctl to get status of flags and reset the memory area as well. While in N
>> threads, the memory is written.
>>
>> I usually run the test by following where memory area is of 100000 * pages:
>> ./win2_linux 8 100000 1 1 0
>>
>> I'm running tests on real hardware. The results are pretty consistent. I'm
>> also testing only on x86_64. PM_SCAN_OP_WP wins every time as compared to
>> UFFDIO_WRITEPROTECT.
>
> If it's multi-threaded test especially when the ioctl runs together with
> the writers, then I'd assume it's caused by writers frequently need to
> flush tlb (when writes during UFFDIO_WRITEPROTECT), the flush target could
> potentially also include the core running the main thread who is also
> trying to reprotect because they run on the same mm.
>
> This makes me think that your current test case probably is the worst case
> of Nadav's patch 6ce64428d6 because (1) the UFFDIO_WRITEPROTECT covers a
> super large range, and (2) there're a _lot_ of concurrent writers during
> the ioctl, so all of them will need to trigger a tlb flush, and that tlb
> flush will further slow down the ioctl sender.
>
> While I think that's the optimal case sometimes, of having minimum tlb
> flush on the ioctl(UFFDIO_WRITEPROTECT), so maybe it makes sense somewhere
> else where concurrent writers are not that much. I'll need to rethink a bit
> on all these to find out whether we can have a good way for both..
>
> For now, if your workload is mostly exactly like your test case, maybe you
> can have your pagemap version of WP-only op there, making sure tlb flush is
> within the pgtable lock critical section (so you should be safe even
> without Nadav's patch). If so, I'd appreciate you can add some comment
> somewhere about such difference of using pagemap WP-only and
> ioctl(UFFDIO_WRITEPROTECT), though. In short, functional-wise they should
> be the same, but trivial detail difference on performance as TBD (maybe one
> day we can have a good approach for all and make them aligned again, but
> maybe that also doesn't need to block your work).
Thank you for understanding what I've been trying to convey. We are going
to translate Windows syscall to this new ioctl. So it is very difficult to
find out the exact use cases as application must be using this syscall in
several different ways. There is one thing for sure is that we want to get
best performance possible which we are getting by adding WP-only. I'll add
it and send v16. I think that we are almost there.

>

--
BR,
Muhammad Usama Anjum