Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2

From: Will Deacon
Date: Tue Dec 01 2020 - 13:16:09 EST


On Wed, Dec 02, 2020 at 01:19:35AM +0800, wangyanan (Y) wrote:
> On 2020/12/1 22:16, Will Deacon wrote:
> > On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
> > > On 2020/11/30 21:21, Will Deacon wrote:
> > > > On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
> > > > > @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > > > > /* There's an existing valid leaf entry, so perform break-before-make */
> > > > > kvm_set_invalid_pte(ptep);
> > > > > kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > > > > + put_page(virt_to_page(ptep));
> > > > > kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> > > > > out:
> > > > > data->phys += granule;
> > > > Isn't this hunk alone sufficient to solve the problem?
> > > >
> > > Not sufficient enough. When the old ptep is valid and old pte equlas new
> > > pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
> > >
> > > and get_page() will still be called.
> > I had a go at fixing this without introducing refcounting to the hyp stage-1
> > case, and ended up with the diff below. What do you think?
>
> Functionally this diff looks fine to me. A small comment inline, please see
> below.
>
> I had made an alternative fix (after sending v1) and it looks much more
> concise.
>
> If you're ok with it, I can send it as v2 (together with patch#2 and #3)
> after some tests.

Thanks.

> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..b232bdd142a6 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -470,6 +470,9 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64
> end, u32 level,
>         if (!kvm_block_mapping_supported(addr, end, phys, level))
>                 return false;
>
> +       if (kvm_pte_valid(*ptep))
> +               put_page(virt_to_page(ptep));
> +
>         if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
>                 goto out;

This is certainly smaller, but see below.

> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 0271b4a3b9fe..78e2c0dc47ae 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
> > smp_store_release(ptep, pte);
> > }
> > -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> > - u32 level)
> > +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> > {
> > - kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
> > + kvm_pte_t pte = kvm_phys_to_pte(pa);
> > u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> > KVM_PTE_TYPE_BLOCK;
> > pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
> > pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> > pte |= KVM_PTE_VALID;
> > -
> > - /* Tolerate KVM recreating the exact same mapping. */
> > - if (kvm_pte_valid(old))
> > - return old == pte;
> > -
> > - smp_store_release(ptep, pte);
> > - return true;
> > + return pte;
> > }
> > static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> > @@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
> > static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > kvm_pte_t *ptep, struct hyp_map_data *data)
> > {
> > + kvm_pte_t new, old = *ptep;
> > u64 granule = kvm_granule_size(level), phys = data->phys;
> > if (!kvm_block_mapping_supported(addr, end, phys, level))
> > return false;
> > - WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
> > + /* Tolerate KVM recreating the exact same mapping. */
> > + new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> > + if (old != new && !WARN_ON(kvm_pte_valid(old)))
> > + smp_store_release(ptep, new);
> > +
> > data->phys += granule;
> > return true;
> > }
> > @@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > kvm_pte_t *ptep,
> > struct stage2_map_data *data)
> > {
> > + kvm_pte_t new, old = *ptep;
> > u64 granule = kvm_granule_size(level), phys = data->phys;
> > if (!kvm_block_mapping_supported(addr, end, phys, level))
> > return false;
> > - if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> > - goto out;
> > + new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> > + if (kvm_pte_valid(old)) {
> > + /*
> > + * There's an existing valid leaf entry, so perform
> > + * break-before-make.
> > + */
> > + kvm_set_invalid_pte(ptep);
> > + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > + put_page(virt_to_page(ptep));
>
> When old PTE is valid and equals to the new one, we will also perform
> break-before-make and the new PTE installation. But they're actually not
> necessary in this case.

Agreed, but I don't think that case should happen for stage-2 mappings.
That's why I prefer my diff here, as it makes that 'old == new' logic
specific to the hyp mappings.

But I'll leave it all up to you (these diffs are only intended to be
helpful).

Will