RE: [RFC]Add new mdev interface for QoS

From: Tian, Kevin
Date: Tue Aug 01 2017 - 22:51:07 EST


> From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> Sent: Wednesday, August 2, 2017 6:26 AM
>
> On Tue, 1 Aug 2017 13:54:27 +0800
> "Gao, Ping A" <ping.a.gao@xxxxxxxxx> wrote:
>
> > On 2017/7/28 0:00, Gao, Ping A wrote:
> > > On 2017/7/27 0:43, Alex Williamson wrote:
> > >> [cc +libvir-list]
> > >>
> > >> On Wed, 26 Jul 2017 21:16:59 +0800
> > >> "Gao, Ping A" <ping.a.gao@xxxxxxxxx> wrote:
> > >>
> > >>> The vfio-mdev provide the capability to let different guest share the
> > >>> same physical device through mediate sharing, as result it bring a
> > >>> requirement about how to control the device sharing, we need a QoS
> > >>> related interface for mdev to management virtual device resource.
> > >>>
> > >>> E.g. In practical use, vGPUs assigned to different quests almost has
> > >>> different performance requirements, some guests may need higher
> priority
> > >>> for real time usage, some other may need more portion of the GPU
> > >>> resource to get higher 3D performance, corresponding we can define
> some
> > >>> interfaces like weight/cap for overall budget control, priority for
> > >>> single submission control.
> > >>>
> > >>> So I suggest to add some common attributes which are vendor agnostic
> in
> > >>> mdev core sysfs for QoS purpose.
> > >> I think what you're asking for is just some standardization of a QoS
> > >> attribute_group which a vendor can optionally include within the
> > >> existing mdev_parent_ops.mdev_attr_groups. The mdev core will
> > >> transparently enable this, but it really only provides the standard,
> > >> all of the support code is left for the vendor. I'm fine with that,
> > >> but of course the trouble with and sort of standardization is arriving
> > >> at an agreed upon standard. Are there QoS knobs that are generic
> > >> across any mdev device type? Are there others that are more specific
> > >> to vGPU? Are there existing examples of this that we can steal their
> > >> specification?
> > > Yes, you are right, standardization QoS knobs are exactly what I wanted.
> > > Only when it become a part of the mdev framework and libvirt, then QoS
> > > such critical feature can be leveraged by cloud usage. HW vendor only
> > > need to focus on the implementation of the corresponding QoS algorithm
> > > in their back-end driver.
> > >
> > > Vfio-mdev framework provide the capability to share the device that lack
> > > of HW virtualization support to guests, no matter the device type,
> > > mediated sharing actually is a time sharing multiplex method, from this
> > > point of view, QoS can be take as a generic way about how to control the
> > > time assignment for virtual mdev device that occupy HW. As result we can
> > > define QoS knob generic across any device type by this way. Even if HW
> > > has build in with some kind of QoS support, I think it's not a problem
> > > for back-end driver to convert mdev standard QoS definition to their
> > > specification to reach the same performance expectation. Seems there
> are
> > > no examples for us to follow, we need define it from scratch.
> > >
> > > I proposal universal QoS control interfaces like below:
> > >
> > > Cap: The cap limits the maximum percentage of time a mdev device can
> own
> > > physical device. e.g. cap=60, means mdev device cannot take over 60% of
> > > total physical resource.
> > >
> > > Weight: The weight define proportional control of the mdev device
> > > resource between guests, itâs orthogonal with Cap, to target load
> > > balancing. E.g. if guest 1 should take double mdev device resource
> > > compare with guest 2, need set weight ratio to 2:1.
> > >
> > > Priority: The guest who has higher priority will get execution first,
> > > target to some real time usage and speeding interactive response.
> > >
> > > Above QoS interfaces cover both overall budget control and single
> > > submission control. I will sent out detail design later once get aligned.
> >
> > Hi Alex,
> > Any comments about the interface mentioned above?
>
> Not really.
>
> Kirti, are there any QoS knobs that would be interesting
> for NVIDIA devices?
>
> Implementing libvirt support at the same time might be an interesting
> exercise if we don't have a second user in the kernel to validate
> against. We could at least have two communities reviewing the feature
> then. Thanks,
>

We planned to introduce new vdev types to indirectly validate
some features (e.g. weight and cap) in our device model, which
however will not exercise the to-be-proposed sysfs interface.
yes, we can check/extend libvirt simultaneously to draw a
whole picture of all required changes in the stack...

Thanks
Kevin