RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()

From: Michael Kelley
Date: Sat Oct 03 2020 - 13:40:20 EST


From: Sasha Levin <sashal@xxxxxxxxxx> Sent: Thursday, October 1, 2020 6:04 AM
>
> On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote:
> >On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote:
> >> Sasha Levin <sashal@xxxxxxxxxx> writes:
> >>
> >> > cpumask can change underneath us, which is generally safe except when we
> >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass
> >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read
> >> > garbage. As reported by KASAN:
> >> >
> >> > [ 83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others
> (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
> >> > [ 83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106
> >> > [ 84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G W 5.4.60 #1
> >> > [ 84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
> BIOS 090008 12/07/2018
> >> > [ 84.196669] Workqueue: writeback wb_workfn (flush-8:0)
> >> > [ 84.196669] Call Trace:
> >> > [ 84.196669] dump_stack (lib/dump_stack.c:120)
> >> > [ 84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375)
> >> > [ 84.196669] __kasan_report.cold (mm/kasan/report.c:507)
> >> > [ 84.196669] kasan_report (arch/x86/include/asm/smap.h:71
> mm/kasan/common.c:635)
> >> > [ 84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128
> arch/x86/hyperv/mmu.c:112)
> >> > [ 84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68
> arch/x86/mm/tlb.c:798)
> >> > [ 84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-
> generic.c:88)
> >> >
> >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper
> HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible")
> >> > Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> >> > Cc: stable@xxxxxxxxxx
> >> > Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
> >> > ---
> >> > arch/x86/hyperv/mmu.c | 4 +++-
> >> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644
> >> > --- a/arch/x86/hyperv/mmu.c
> >> > +++ b/arch/x86/hyperv/mmu.c
> >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask
> *cpus,
> >> > * must. We will also check all VP numbers when walking the
> >> > * supplied CPU set to remain correct in all cases.
> >> > */
> >> > - if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
> >> > + int last = cpumask_last(cpus);
> >> > +
> >> > + if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >=
> 64)
> >> > goto do_ex_hypercall;
> >>
> >> In case 'cpus' can end up being empty (I'm genuinely suprised it can)
>
> I was just as surprised as you and spent the good part of a day
> debugging this. However, a:
>
> WARN_ON(cpumask_empty(cpus));
>
> triggers at that line of code even though we check for cpumask_empty()
> at the entry of the function.

What does the call stack look like when this triggers? I'm curious about
the path where the 'cpus' could be changing while the flush call is in
progress.

I wonder if CPUs could ever be added to the mask? Removing CPUs can
be handled with some care because an unnecessary flush doesn't hurt
anything. But adding CPUs has serious correctness problems.

>
> >> the check is mandatory indeed. I would, however, just return directly in
> >> this case:
>
> Makes sense.

But need to do a local_irq_restore() before returning.

>
> >> if (last < num_possible_cpus())
> >> return;
> >
> >I think you want
> >
> > last >= num_possible_cpus()
> >
> >here?

Yes, but also the && must become ||

> >
> >A more important question is, if the mask can change willy-nilly, what
> >is stopping it from changing between these checks? I.e. is there still a
> >windows that hv_cpu_number_to_vp_number(last) can return garbage?
>
> It's not that hv_cpu_number_to_vp_number() returns garbage, the issue is
> that we feed it garbage.
>
> hv_cpu_number_to_vp_number() expects that the input would be in the
> range of 0 <= X < num_possible_cpus(), and here if 'cpus' was empty we
> would pass in X==num_possible_cpus() making it read out of bound.
>
> Maybe it's worthwhile to add a WARN_ON() into
> hv_cpu_number_to_vp_number() to assert as well.

If the input cpumask can be changing, the other risk is the for_each_cpu()
loop, which also has a call to hv_cpu_number_to_vp_number(). But looking at
the implementation of for_each_cpu(), it will always return an in-bounds value,
so everything should be OK.

>
> --
> Thanks,
> Sasha