Re: [PATCH v7 34/36] x86/mm: Add support to encrypt the kernel in-place

From: Tom Lendacky
Date: Fri Jun 23 2017 - 13:45:19 EST

On 6/23/2017 5:00 AM, Borislav Petkov wrote:
On Fri, Jun 16, 2017 at 01:56:19PM -0500, Tom Lendacky wrote:
Add the support to encrypt the kernel in-place. This is done by creating
new page mappings for the kernel - a decrypted write-protected mapping
and an encrypted mapping. The kernel is encrypted by copying it through
a temporary buffer.

Signed-off-by: Tom Lendacky <thomas.lendacky@xxxxxxx>
arch/x86/include/asm/mem_encrypt.h | 6 +
arch/x86/mm/Makefile | 2
arch/x86/mm/mem_encrypt.c | 314 ++++++++++++++++++++++++++++++++++++
arch/x86/mm/mem_encrypt_boot.S | 150 +++++++++++++++++
4 files changed, 472 insertions(+)
create mode 100644 arch/x86/mm/mem_encrypt_boot.S

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index af835cf..7da6de3 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -21,6 +21,12 @@
extern unsigned long sme_me_mask;
+void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
+ unsigned long decrypted_kernel_vaddr,
+ unsigned long kernel_len,
+ unsigned long encryption_wa,
+ unsigned long encryption_pgd);
void __init sme_early_encrypt(resource_size_t paddr,
unsigned long size);
void __init sme_early_decrypt(resource_size_t paddr,
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 9e13841..0633142 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -38,3 +38,5 @@ obj-$(CONFIG_NUMA_EMU) += numa_emulation.o
obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
+obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 842c8a6..6e87662 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -24,6 +24,8 @@
#include <asm/setup.h>
#include <asm/bootparam.h>
#include <asm/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/sections.h>
* Since SME related variables are set early in the boot process they must
@@ -209,8 +211,320 @@ void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
+static void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start,
+ unsigned long end)
+ unsigned long pgd_start, pgd_end, pgd_size;
+ pgd_t *pgd_p;
+ pgd_start = start & PGDIR_MASK;
+ pgd_end = end & PGDIR_MASK;
+ pgd_size = (((pgd_end - pgd_start) / PGDIR_SIZE) + 1);
+ pgd_size *= sizeof(pgd_t);
+ pgd_p = pgd_base + pgd_index(start);
+ memset(pgd_p, 0, pgd_size);
+#ifndef CONFIG_X86_5LEVEL
+#define native_make_p4d(_x) (p4d_t) { .pgd = native_make_pgd(_x) }

Huh, why isn't this in arch/x86/include/asm/pgtable_types.h in the #else
branch of #if CONFIG_PGTABLE_LEVELS > 4 ?

Normally the __p4d() macro would be used and that would be ok whether
CONFIG_X86_5LEVEL is defined or not. But since __p4d() is part of the
paravirt ops path I have to use native_make_p4d(). I'd be the only user
of the function and thought it would be best to localize it this way.


ERROR: Macros with complex values should be enclosed in parentheses
#105: FILE: arch/x86/mm/mem_encrypt.c:232:
+#define native_make_p4d(_x) (p4d_t) { .pgd = native_make_pgd(_x) }

so why isn't it a function?

I can define it as an inline function.

+static void __init *sme_populate_pgd(pgd_t *pgd_base, void *pgtable_area,
+ unsigned long vaddr, pmdval_t pmd_val)
+ pgd_t *pgd_p;
+ p4d_t *p4d_p;
+ pud_t *pud_p;
+ pmd_t *pmd_p;
+ pgd_p = pgd_base + pgd_index(vaddr);
+ if (native_pgd_val(*pgd_p)) {

Err, I don't understand: so this is a Kconfig symbol and when it is
enabled at build time, you do a 5level pagetable.

But you can't stick a 5level pagetable to a hardware which doesn't know
about it.

True, 5-level will only be turned on for specific hardware which is why
I originally had this as only 4-level pagetables. But in a comment from
you back on the v5 version you said it needed to support 5-level. I
guess we should have discussed this more, but I also thought that should
our hardware ever support 5-level paging in the future then this would
be good to go.

Or do you mean that p4d layer folding at runtime to happen? (I admit, I
haven't looked at that in detail.) But then I'd hope that the generic
macros/functions would give you the ability to not care whether we have
a p4d or not and not add a whole bunch of ifdeffery to this code.

The macros work great if you are not running identity mapped. You could
use p*d_offset() to move easily through the tables, but those functions
use __va() to generate table virtual addresses. I've seen where
boot/compressed/pagetable.c #defines __va() to work with identity mapped
pages but that would only work if I create a separate file just for this

Given when this occurs it's very similar to what __startup_64() does in
regards to the IS_ENABLED(CONFIG_X86_5LEVEL) checks.