Re: [HMM 12/15] mm/migrate: new memory migration helper for use with device memory v4

From: Evgeny Baskakov
Date: Fri Jul 14 2017 - 15:43:57 EST


On 7/13/17 1:16 PM, Jerome Glisse wrote:

...


Hi Jerome,

I have hit another kind of hang. Briefly, if a not yet allocated page faults on CPU during migration to device memory, any subsequent migration will fail for such page. Such a situation can trigger if a CPU page fault happens just immediately after migrate_vma() starts unmapping pages to migrate.

Please find attached a reproducer based on the sample driver. In the hmm_test() function, an HMM_DMIRROR_MIGRATE request is triggered from a separate thread for not yet allocated pages (coming from malloc). In the same time, a HMM_DMIRROR_READ request is made for the same pages. This results in a sporadic app-side hang, because random number of pages never migrate to device memory.

Note that if the pages are touched (initialized with data) prior to that, everything works as expected: all HMM_DMIRROR_READ and HMM_DMIRROR_MIGRATE requests eventually succeed. See comments in the hmm_test() function.

Thanks!

--
Evgeny Baskakov
NVIDIA

Attachment: sanity_rmem004_repeated_faults_threaded_notallocated.tgz
Description: GNU Zip compressed data