Re: [PATCH] KVM: arm64: Use BTI for pKVM

From: Marc Zyngier
Date: Tue May 16 2023 - 11:49:21 EST


On Tue, 16 May 2023 15:18:46 +0100,
Mostafa Saleh <smostafa@xxxxxxxxxx> wrote:
>
> CONFIG_ARM64_BTI_KERNEL compiles the kernel to support ARMv8.5-BTI.
> However, the nvhe code doesn't make use of it as it doesn't map any
> pages with Guarded Page(GP) bit.
>
> This patch maps pKVM .text section with GP bit which matches the
> kernel handling for BTI.

Why pKVM only? Surely we can benefit from it all over the nvhe code,
right?

>
> A new flag is added to enum kvm_pgtable_prot: KVM_PGTABLE_PROT_GP_S1,
> which represents BTI guarded page in hypervisor stage-1 page table.
>
> Signed-off-by: Mostafa Saleh <smostafa@xxxxxxxxxx>
> ---
> arch/arm64/include/asm/kvm_pgtable.h | 3 +++
> arch/arm64/kvm/hyp/nvhe/setup.c | 8 ++++++--
> arch/arm64/kvm/hyp/pgtable.c | 6 ++++--
> 3 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 4cd6762bda80..5bcd06d664d3 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -151,6 +151,7 @@ enum kvm_pgtable_stage2_flags {
> * @KVM_PGTABLE_PROT_W: Write permission.
> * @KVM_PGTABLE_PROT_R: Read permission.
> * @KVM_PGTABLE_PROT_DEVICE: Device attributes.
> + * @KVM_PGTABLE_PROT_GP_S1: GP(guarded page) used for BTI in stage-1 only
> * @KVM_PGTABLE_PROT_SW0: Software bit 0.
> * @KVM_PGTABLE_PROT_SW1: Software bit 1.
> * @KVM_PGTABLE_PROT_SW2: Software bit 2.
> @@ -163,6 +164,8 @@ enum kvm_pgtable_prot {
>
> KVM_PGTABLE_PROT_DEVICE = BIT(3),
>
> + KVM_PGTABLE_PROT_GP_S1 = BIT(50),
> +
> KVM_PGTABLE_PROT_SW0 = BIT(55),
> KVM_PGTABLE_PROT_SW1 = BIT(56),
> KVM_PGTABLE_PROT_SW2 = BIT(57),
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 110f04627785..95f80e2b2946 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -66,7 +66,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> {
> void *start, *end, *virt = hyp_phys_to_virt(phys);
> unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
> - enum kvm_pgtable_prot prot;
> + enum kvm_pgtable_prot prot = PAGE_HYP_EXEC;
> int ret, i;
>
> /* Recreate the hyp page-table using the early page allocator */
> @@ -88,7 +88,11 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> if (ret)
> return ret;
>
> - ret = pkvm_create_mappings(__hyp_text_start, __hyp_text_end, PAGE_HYP_EXEC);
> + /* Hypervisor text is mapped as guarded pages(GP). */
> + if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) && cpus_have_const_cap(ARM64_BTI))
> + prot |= KVM_PGTABLE_PROT_GP_S1;

Is there any reason why this isn't a final cap? I also dislike the
IS_ENABLED(), but I can see that we don't have separate caps for
in-kernel BTI and userspace visible BTI...

> +
> + ret = pkvm_create_mappings(__hyp_text_start, __hyp_text_end, prot);
> if (ret)
> return ret;
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 3d61bd3e591d..028e198acd48 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -145,7 +145,8 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> KVM_PTE_TYPE_BLOCK;
>
> - pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
> + pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI |
> + KVM_PGTABLE_PROT_GP_S1);
> pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> pte |= KVM_PTE_VALID;
>
> @@ -378,7 +379,8 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
> attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
> attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
> - attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
> + attr |= prot & (KVM_PTE_LEAF_ATTR_HI_SW | KVM_PGTABLE_PROT_GP_S1);
> +

You should probably check that the page is executable before blindly
accepting to set the GP bit (don't accept it for non-exec pages).

Another thing to check would be the state of SCTLR_EL2.BT, which I
think we clear by construction, but it be worth having a look.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.