Re: [PATCH 3/5] lib: lockless generic and arch independent page table (gpt) v2.

From: Linus Torvalds
Date: Thu Nov 13 2014 - 20:51:06 EST


On Thu, Nov 13, 2014 at 5:18 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> More importantly, nobody should ever care. Because the whole *point*
> of the tree walker is that the user never sees any of this. This is
> purely an implementation detail of the tree itself. Somebody who just
> *walks* the tree only sees the final end result.
>
> And *that* is the "walk()" callback. Which gets the virtual address
> and the length, exactly so that for a super-page you don't even really
> see the difference between walking different levels (well, you do see
> it, since the length will differ).
>
> Now, I didn't actually try to make that whole thing very transparent.

Side note: I'm not entirely sure it *can* be made entirely transparent.

Just as an example: if what you want to do is actually "access" the
data for some copying operation, then for a real CPU page table what
you want to do is to actually map the entry. And you definitely do not
want to map the entry one single page at a time - if you have a
top-level page directory entry, you'd want to map the whole page
directory entry, not the sub-pages of it. So mapping the thing is very
much level-dependent.

Fine, "just add 'map()'/'unmap()' functions to the tree description,
the same way we have lookup/walk. Yes, that would be fairly easy, but
it only works for CPU page tables. if you want to copy from device
data, what you want is more of a physical address thing that you do
DMA on, not a "map/unmap" model.

So I suspect *some* amount of per-tree knowledge is required. Or just
knowledge of what people actually want to do when walking the tree.

So don't get me wrong - I'm making excuses for not really having a
fleshed-out interface, but I'm making them because I think the
interface will either have to be tree-specific, or because we need
higher-level interfaces for what we actually want to do while walking.
That then decides where these kinds of tree differences will be
handled: will they be handled by the caller knowing that certain trees
are used in certain ways, or will they be handled by the tree walking
abstraction being explicitly extended to do certain operations? Or
will it be a bit of both?

See what I'm trying to say? There is no way to make the tree-walking
"truly generic" in the sense that you can do anything you want with
the results, because the *meaning* of the results will inevitably
depend a bit on what the trees are actually describing. Are they
describing local memory or remote memory?

Jerome had a "convert 'struct tree_entry *' to 'struct page *'"
function, but that doesn't necessarily work in the generic case
either, and is questionable with super-pages anyway (although
generally it works fairly well by just saying that they get described
by the first page in the superpage). But for actual CPU page tables,
some of the pages in those page tables may not *have* a "struct page"
associated with them at all, because they are mappings of
memory-mapped devices in high memory. So again, in a _generic_ model
that you might want to start replacing some of the actual VM code
with, you simply cannot use 'struct page' as some kind of generic
entry. At some level, the only thing you have is the actual page table
entry pointer, and the value behind it.

And it may well be ok to just say "the walker isn't generic in _that_
sense". A walker that can walk arbitrary page-table-tree-like
structures can still be useful just for the walking part, even if the
users might then always have to be aware of the final tree details. At
least they don't need to re-implement the basic iterator, they'll just
have to implement the "what do I do with the end result" for their
particular tree layout. So a walker can be generic at _just_
walking/iterating, but not necessarily at actually using the end
result.

I hope I'm explaining that logic well enough..

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/