Re: [PATCH v5 8/8] KVM: VMX: Resize PID-ponter table on demand for IPI virtualization

From: Sean Christopherson
Date: Fri Jan 14 2022 - 11:18:35 EST


On Fri, Jan 14, 2022, Zeng Guang wrote:
> On 1/14/2022 6:09 AM, Sean Christopherson wrote:
> > On Fri, Dec 31, 2021, Zeng Guang wrote:
> > > +static int vmx_expand_pid_table(struct kvm_vmx *kvm_vmx, int entry_idx)
> > > +{
> > > + u64 *last_pid_table;
> > > + int last_table_size, new_order;
> > > +
> > > + if (entry_idx <= kvm_vmx->pid_last_index)
> > > + return 0;
> > > +
> > > + last_pid_table = kvm_vmx->pid_table;
> > > + last_table_size = table_index_to_size(kvm_vmx->pid_last_index + 1);
> > > + new_order = get_order(table_index_to_size(entry_idx + 1));
> > > +
> > > + if (vmx_alloc_pid_table(kvm_vmx, new_order))
> > > + return -ENOMEM;
> > > +
> > > + memcpy(kvm_vmx->pid_table, last_pid_table, last_table_size);
> > > + kvm_make_all_cpus_request(&kvm_vmx->kvm, KVM_REQ_PID_TABLE_UPDATE);
> > > +
> > > + /* Now old PID table can be freed safely as no vCPU is using it. */
> > > + free_pages((unsigned long)last_pid_table, get_order(last_table_size));
> > This is terrifying. I think it's safe? But it's still terrifying.
>
> Free old PID table here is safe as kvm making request KVM_REQ_PI_TABLE_UPDATE
> with KVM_REQUEST_WAIT flag force all vcpus trigger vm-exit to update vmcs
> field to new allocated PID table. At this time, it makes sure old PID table
> not referenced by any vcpu.
> Do you mean it still has potential problem?

No, I do think it's safe, but it is still terrifying :-)

> > Rather than dynamically react as vCPUs are created, what about we make max_vcpus
> > common[*], extend KVM_CAP_MAX_VCPUS to allow userspace to override max_vcpus,
> > and then have the IPIv support allocate the PID table on first vCPU creation
> > instead of in vmx_vm_init()?
> >
> > That will give userspace an opportunity to lower max_vcpus to reduce memory
> > consumption without needing to dynamically muck with the table in KVM. Then
> > this entire patch goes away.
> IIUC, it's risky if relying on userspace .

That's why we have cgroups, rlimits, etc...

> In this way userspace also have chance to assign large max_vcpus but not use
> them at all. This cannot approach the goal to save memory as much as possible
> just similar as using KVM_MAX_VCPU_IDS to allocate PID table.

Userspace can simply do KVM_CREATE_VCPU until it hits KVM_MAX_VCPU_IDS...