Re: [PATCH 06/36] HMM: add HMM page table v2.

From: Jerome Glisse
Date: Wed Jul 01 2015 - 11:07:26 EST


On Tue, Jun 30, 2015 at 07:51:12PM -0700, Mark Hairgrove wrote:
> On Mon, 29 Jun 2015, Jerome Glisse wrote:
> > [...]
> >
> > Iterator is what protect against concurrent freeing of the directory so it
> > has to return to caller on directory boundary (for 64bits arch with 64bits
> > pte it has return every 512 entries). Otherwise pt_iter_fini() would have
> > to walk over the whole directory range again just to drop reference and this
> > doesn't sound like a good idea.
>
> I don't understand why it would have to return to the caller to unprotect
> the directory. The iterator would simply drop the reference to the
> previous directory, take a reference on the next one, and keep searching
> for a valid entry.
>
> Why would pt_iter_fini have to walk over the entire range? The iterator
> would keep at most one directory per level referenced. _fini would walk
> the per-level ptd array and unprotect each level, the same way it does
> now.

I think here we are just misunderstanding each other. I am saying that
iterator have to return on directory boundary (ie when switching from one
directory to the next). The return part is not only for protection it is
also by design because iterator function should not test the page table
entry as different code path have different synchronization requirement.


> > So really with what you are asking it whould be:
> >
> > hmm_pt_iter_init(&iter, start, end);
> > for(next=pt_iter_next(&iter,&ptep); next<end; next=pt_iter_next(&iter,&ptep))
> > {
> > // Here ptep is valid until next address. Above you have to call
> > // pt_iter_next() to switch to next directory.
> > addr = max(start, next - (~HMM_PMD_MASK + 1));
> > for (; addr < next; addr += PAGE_SIZE, ptep++) {
> > // access ptep
> > }
> > }
> >
> > My point is that internally pt_iter_next() will do the exact same test it is
> > doing now btw cur and addr. Just that the addr is no longer explicit but iter
> > infer it.
>
> But this way, the iteration across directories is more efficient because
> the iterator can simply walk the directory array. Take a directory that
> has one valid entry at the very end. The existing iteration will do this:
>
> hmm_pt_iter_next(dir_addr[0], end)
> Walk up the ptd array
> Compute level start and end and compare them to dir_addr[0]
> Compute dir_addr[1] using addr and pt->mask
> Return dir_addr[1]
> hmm_pt_iter_update(dir_addr[1])
> Walk up the ptd array, compute level start and end
> Compute level index of dir_addr[1]
> Read entry for dir_addr[1]
> Return NULL
> hmm_pt_iter_next(dir_addr[1], end)
> ...
> And so on 511 times until the last entry is read.
>
> This is really more suited to a for loop iteration, which it could be if
> this were fully contained within the _next call.

No, existing code does not necessarily do that. Current use pattern is :

for (addr = start; addr < end;) {
ptep = hmm_pt_iter_update(iter, addr);
if (!ptep) {
addr = hmm_pt_iter_next(iter, addr, end);
continue;
}
next = hmm_pt_level_next(pt, addr, end);
for (; addr < next; addr += PAGE_SIZE, ptep++) {
// Process addr using ptep.
}
}

The inner loop is on directory boundary ie 512 entries on 64bits arch.
It is that way because on some case you do not want the iterator to
control the address, the outer loop might be accessing several different
mirror page table each might have different gap. So you really want to
have explicit address provided to the iterator function.

Also iterator can not really test for valid entry as locking requirement
and synchronization with other thread is different depending on which
code path is walking the page table. So testing inside the iterator
function is kind of pointless has the performed test might be no longer
revealent by the time it returns pointer and address to the caller.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/