Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t
From: Linus Torvalds
Date: Wed May 06 2015 - 20:19:59 EST
On Wed, May 6, 2015 at 4:47 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
> Conceptually better, but certainly more difficult to audit if the fake
> struct page is initialized in a subtle way that breaks when/if it
> leaks to some unwitting context.
Maybe. It could go either way, though. In particular, with the
"dynamically allocated struct page" approach, if somebody uses it past
the supposed lifetime of the use, things like poisoning the temporary
"struct page" could be fairly effective. You can't really poison the
pfn - it's just a number, and if somebody uses it later than you think
(and you have re-used that physical memory for something else), you'll
never ever know.
I'd *assume* that most users of the dynamic "struct page" allocation
have very clear lifetime rules. Those things would presumably normally
get looked-up by some extended version of "get_user_pages()", and
there's a clear use of the result, with no longer lifetime. Also, you
do need to have some higher-level locking when you do this, to make
sure that the persistent pages don't magically get re-assigned. We're
presumably talking about having a filesystem in that persistent
memory, so we cannot be doing IO to the pages (from some other source
- whether RDMA or some special zero-copy model) while the underlying
filesystem is reassigning the storage because somebody deleted the
file.
IOW, there had better be other external rules about when - and how
long - you can use a particular persistent page. No? So the whole
"when/how to allocate the temporary 'struct page'" is just another
detail in that whole thing.
And yes, some uses may not ever actually see that. If the whole of
persistent memory is just assigned to a database or something, and the
DB just wants to do a "flush this range of persistent memory to
long-term disk storage", then there may not be much of a "lifetime"
issue for the persistent memory. But even then you're going to have IO
completion callbacks etc to let the DB know that it has hit the disk,
so..
What is the primary thing that is driving this need? Do we have a very
concrete example?
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/