Re: [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend

From: Juergen Gross
Date: Wed May 17 2017 - 01:21:27 EST


On 16/05/17 21:58, Stefano Stabellini wrote:
> On Tue, 16 May 2017, Juergen Gross wrote:
>> On 15/05/17 22:35, Stefano Stabellini wrote:
>>> The pvcalls backend has one ioworker per cpu: the ioworkers are
>>> implemented as a cpu bound workqueue, and will deal with the actual
>>> socket and data ring reads/writes.
>>>
>>> ioworkers are global: we only have one set for all the frontends. They
>>> process requests on their wqs list in order, once they are done with a
>>> request, they'll remove it from the list. A spinlock is used for
>>> protecting the list. Each ioworker is bound to a different cpu to
>>> maximize throughput.
>>>
>>> Signed-off-by: Stefano Stabellini <stefano@xxxxxxxxxxx>
>>> CC: boris.ostrovsky@xxxxxxxxxx
>>> CC: jgross@xxxxxxxx
>>> ---
>>> drivers/xen/pvcalls-back.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 64 insertions(+)
>>>
>>> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
>>> index 2dbf7d8..46a889a 100644
>>> --- a/drivers/xen/pvcalls-back.c
>>> +++ b/drivers/xen/pvcalls-back.c
>>> @@ -25,6 +25,26 @@
>>> #include <xen/xenbus.h>
>>> #include <xen/interface/io/pvcalls.h>
>>>
>>> +struct pvcalls_ioworker {
>>> + struct work_struct register_work;
>>> + atomic_t io;
>>> + struct list_head wqs;
>>> + spinlock_t lock;
>>> + int num;
>>> +};
>>> +
>>> +struct pvcalls_back_global {
>>> + struct pvcalls_ioworker *ioworkers;
>>> + int nr_ioworkers;
>>> + struct workqueue_struct *wq;
>>> + struct list_head privs;
>>> + struct rw_semaphore privs_lock;
>>> +} pvcalls_back_global;
>>> +
>>> +static void pvcalls_back_ioworker(struct work_struct *work)
>>> +{
>>> +}
>>> +
>>> static int pvcalls_back_probe(struct xenbus_device *dev,
>>> const struct xenbus_device_id *id)
>>> {
>>> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
>>> .uevent = pvcalls_back_uevent,
>>> .otherend_changed = pvcalls_back_changed,
>>> };
>>> +
>>> +static int __init pvcalls_back_init(void)
>>> +{
>>> + int ret, i, cpu;
>>> +
>>> + if (!xen_domain())
>>> + return -ENODEV;
>>> +
>>> + ret = xenbus_register_backend(&pvcalls_back_driver);
>>> + if (ret < 0)
>>> + return ret;
>>> +
>>> + init_rwsem(&pvcalls_back_global.privs_lock);
>>> + INIT_LIST_HEAD(&pvcalls_back_global.privs);
>>> + pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
>>> + if (!pvcalls_back_global.wq)
>>> + goto error;
>>> + pvcalls_back_global.nr_ioworkers = num_online_cpus();
>>
>> Really? Recently I cam across a system with 640 dom0 cpus. I don't think
>> we want 640 workers initialized when loading the backend module. I'd
>> prefer one or a few workers per connected frontend.
>
> I think we want to keep the ioworker allocation to be based on the
> number of vcpus: we do not want more ioworkers than vcpus because it is
> a waste of resources and leads to worse performance. Also, given that
> they do memcpy's, I also think it is a good idea to bind them to vcpus
> (and pin vcpus to pcpus) to get best performance.

This will cause a lot of pain for the cpu offline case. Please don't try
to work against the hypervisor scheduler by designing a backend based on
a vcpu pin policy. This might result in best performance for your
special workload, but generally it is a bad idea!

> However, you have a point there: we need to handle systems with an
> extremely large number of Dom0 vcpus. I suggest we introduce an
> upper limit for the number of ioworkers. Something like:
>
> #define MAX_IOWORKERS 64
> nr_ioworkers = min(MAX_IOWORKERS, num_online_cpus())
>
> MAX_IOWORKERS could be configurable via a command line option.

Later you are assigning each active socket to exactly one ioworker.
Wouldn't it make more sense to allocate the ioworker when doing
the connect? This would avoid the problem of having only a statistical
distribution, possibly with all sockets on the same ioworker.

Basically you are re-inventing the wheel by using an own workqueue
implementation in each ioworker looping through all assigned sockets.


Juergen