Re: [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables

From: Linus Torvalds
Date: Wed Nov 01 2017 - 14:27:39 EST


On Wed, Nov 1, 2017 at 10:31 AM, Dave Hansen
<dave.hansen@xxxxxxxxxxxxxxx> wrote:
>
> I assume that you're really worried about having to go two places to do
> one thing, like clearing a dirty bit, or unmapping a PTE, especially
> when we have to do that for userspace. Thankfully, the sharing of the
> page tables (under the PGD) for userspace gets rid of most of this
> nastiness.

Right. That's the primary thing, and just clarifying that this is for
kernel addresses only will help at least some.

But even for the kernel case, it worries me a bit. We have much fewer
coherency issues for the kernel, but we do end up having some cases
that modify kernel mappings too. Most notably there are the
cacheability things where we've had machine check exceptions when the
same page is mapped non-cachable in user space and cacheable in kernel
space, which ends up causing all that pain we have in
arch/x86/mm/pageattr.c.

I very much think you limit the pages that get mapped in the shadow
page tables to the point where this shouldn't be an issue, but at the
same time, I very much do want people to be aware of it and this be
commented very clearly in the code.

Honestly, the code looks like it is designed to, and can, map
arbitrary physical pages at arbitrary virtual addresses. And that is
NOT RIGHT.

So I'd like to see not just the comments about this, but I'd like to
see the code itself actually making that very clear. Have *code* that
verifies that nobody ever tries to use this on a user address (because
that would *completely* screw up all coherency), but also I don't see
why the code possibly looks up the old physical address in ther page
table. Is there _any_ possible reason why you'd want to look up a page
from an old page table? As far as I can tell, we should always know
the physical page we are mapping a priori - we've never re-mapping
random virtual addresses or a highmem page or anything like that.
We're mapping the 1:1 kernel mapping only.

So the code really looks much too generic to me. It seems to be
designed to be used for cases where it simply could not *possibly* be
valid to use.

There's a disease in computer science that thinks that "generic code"
is somehow better code. That's not the case. We aren't mapping generic
pages, and must not map them or let make people make that mistake. I'd
*much* rather the code make it very clear that it's not generic code
in any way shape or form.

Linus