Re: [PATCH v8 9/9] sparc64: Add support for ADI (Application Data Integrity)

From: Anthony Yznaga
Date: Fri Oct 13 2017 - 13:21:21 EST



> On Oct 13, 2017, at 9:18 AM, Khalid Aziz <khalid.aziz@xxxxxxxxxx> wrote:
>
> On 10/13/2017 08:14 AM, Khalid Aziz wrote:
>> On 10/12/2017 02:27 PM, Anthony Yznaga wrote:
>>>
>>>> On Oct 12, 2017, at 7:44 AM, Khalid Aziz <khalid.aziz@xxxxxxxxxx> wrote:
>>>>
>>>>
>>>> On 10/06/2017 04:12 PM, Anthony Yznaga wrote:
>>>>>> On Sep 25, 2017, at 9:49 AM, Khalid Aziz <khalid.aziz@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> This patch extends mprotect to enable ADI (TSTATE.mcde), enable/disable
>>>>>> MCD (Memory Corruption Detection) on selected memory ranges, enable
>>>>>> TTE.mcd in PTEs, return ADI parameters to userspace and save/restore ADI
>>>>>> version tags on page swap out/in or migration. ADI is not enabled by
>>>>> I still don't believe migration is properly supported. Your
>>>>> implementation is relying on a fault happening on a page while its
>>>>> migration is in progress so that do_swap_page() will be called, but
>>>>> I don't see how do_swap_page() will be called if a fault does not
>>>>> happen until after the migration has completed.
>>>>
>>>> User pages are on LRU list and for the mapped pages on LRU list, migrate_pages() ultimately calls try_to_unmap_one and makes a migration swap entry for the page being migrated. This forces a page fault upon access on the destination node and the page is swapped back in from swap cache. The fault is forced by the migration swap entry, rather than fault being an accidental event. If page fault happens on the destination node while migration is in progress, do_swap_page() waits until migration is done. Please take a look at the code in __unmap_and_move().
>>>
>>> I looked at the code again, and I now believe ADI tags are never restored for migrated pages. Here's why:
>>>
>> I will take a look at it again. I have run extensive tests migrating pages of a process across multiple NUMA nodes over and over again and ADI tags were never lost, so this does work. I won't rule out the possibility of having missed a code path where tags are not restored and I will look for it.
>
> Anthony,
>
> I just ran my migration test again which:
>
> - malloc's 16 GB of memory
> - Assigns a rotating ADI tag every 64 bytes to the malloc'd buffer
> - Writes a pattern to the entire buffer
> - Verifies the pattern it wrote using ADI tagged addresses.

The verification will appear to succeed if the tags have been cleared.

To be complete the test should also manually verify that the in-memory tag values remain non-zero after migration. migrate_page_copy() will call copy_huge_page() or copy_highpage() which will result in the tags being cleared at the destination because the stores will be done to kernel physical mapping VAs using block initializing stores.

Anthony

>
> While this test was running, I had a script migrate test program pages across two NUMA nodes every 30 seconds using migratepages command. I did not see an ADI tag mismatch over multiple runs of this test. This test shows migration is working.
>
> Can you give me a test that shows the failure you think we should see and I will debug it.
>
> Thanks,
> Khalid
>
> --
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html