Re: [RFC] per-CPU cryptd thread implementation based on workqueue

From: Huang Ying
Date: Sun Feb 01 2009 - 04:37:39 EST


Sorry for my late. Last week is Chinese new year holiday.

On Sat, 2009-01-24 at 15:07 +0800, Andrew Morton wrote:
> On Thu, 22 Jan 2009 10:32:17 +0800 Huang Ying <ying.huang@xxxxxxxxx> wrote:
>
> > Use dedicate workqueue for crypto
> >
> > - A dedicated workqueue named kcrypto_wq is created.
> >
> > - chainiv uses kcrypto_wq instead of keventd_wq.
> >
> > - For cryptd, struct cryptd_queue is defined to be a per-CPU queue,
> > which holds one struct cryptd_cpu_queue for each CPU. In struct
> > cryptd_cpu_queue, a struct crypto_queue holds all requests for the
> > CPU, a struct work_struct is used to run all requests for the CPU.
> >
>
> Please always prefer to include performance measurements when proposing
> a performance-enhancing patch. Otherwise we have no basis upon which
> to evaluate the patch's worth.

I will setup a testing to measure the performance gain.

> > +++ b/crypto/crypto_wq.c
> > @@ -0,0 +1,39 @@
> > +/*
> > + * Workqueue for crypto subsystem
> > + *
> > + * Copyright (c) 2009 Intel Corp.
> > + * Author: Huang Ying <ying.huang@xxxxxxxxx>
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms of the GNU General Public License as published by the Free
> > + * Software Foundation; either version 2 of the License, or (at your option)
> > + * any later version.
> > + *
> > + */
> > +
> > +#include <linux/workqueue.h>
> > +#include <crypto/algapi.h>
> > +#include <crypto/crypto_wq.h>
> > +
> > +struct workqueue_struct *kcrypto_wq;
> > +EXPORT_SYMBOL_GPL(kcrypto_wq);
> > +
> > +static int __init crypto_wq_init(void)
> > +{
> > + kcrypto_wq = create_workqueue("crypto");
> > + if (unlikely(!kcrypto_wq))
> > + return -ENOMEM;
> > + return 0;
> > +}
> > +
> > +static void __exit crypto_wq_exit(void)
> > +{
> > + if (likely(kcrypto_wq))
> > + destroy_workqueue(kcrypto_wq);
>
> I don't believe that it is possible to get here with kcrypto_wq==NULL.

Yes. I will fix this.

> >
> > ...
> >
> > +int cryptd_enqueue_request(struct cryptd_queue *queue,
> > + struct crypto_async_request *request)
> > +{
> > + int cpu, err, queued;
> > + struct cryptd_cpu_queue *cpu_queue;
> > +
> > + cpu = get_cpu();
> > + cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
> > + spin_lock_bh(&cpu_queue->lock);
> > + err = crypto_enqueue_request(&cpu_queue->queue, request);
> > + spin_unlock_bh(&cpu_queue->lock);
> > + /* INUSE should be set after queue->qlen assigned, but
> > + * spin_unlock_bh imply a memory barrior already */
> > + if (!test_and_set_bit_lock(CRYPTD_STATE_INUSE, &cpu_queue->state)) {
> > + queued = queue_work(kcrypto_wq, &cpu_queue->work);
> > + BUG_ON(!queued);
> > + }
>
> Do we actually need to use CRYPTD_STATE_INUSE here? The
> WORK_STRUCT_PENDING handling in the workqueue does basically the same
> thing?

Yes. It is not necessary, I will fix this.

> > + put_cpu();
> > +
> > + return err;
> > +}
> > +
> > +int cryptd_tfm_in_queue(struct cryptd_queue *queue, struct crypto_tfm *tfm)
>
> Did this need to have global scope?

It should be static. I will fix this.

> > +{
> > + int cpu, in_queue;
> > + struct cryptd_cpu_queue *cpu_queue;
> > +
> > + for_each_possible_cpu(cpu) {
> > + cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
> > + spin_lock_bh(&cpu_queue->lock);
> > + in_queue = crypto_tfm_in_queue(&cpu_queue->queue, tfm);
> > + spin_unlock_bh(&cpu_queue->lock);
> > + if (in_queue)
> > + return 1;
> > + }
> > + return 0;
> > +}
>
> Did you consider using for_each_online_cpu() and implementing CPU
> hotplug? There might be situations where the number of possible CPUs
> is much greater than the number of online CPUs.

For cryptd_queue_init() and cryptd_queue_fini(), I think
for_each_possible_cpu() is sufficient and simpler. Because they are only
need to be executed once and in slow path. For cryptd_in_queue,
for_each_online_cpu() can be used.

> > +static void cryptd_queue_work_done(struct cryptd_cpu_queue *cpu_queue)
> > +{
> > + int queued;
> > +
> > + if (!cpu_queue->queue.qlen) {
> > + clear_bit_unlock(CRYPTD_STATE_INUSE, &cpu_queue->state);
> > + /* queue.qlen must be checked after INUSE bit cleared */
> > + smp_mb();
> > + if (!cpu_queue->queue.qlen ||
> > + test_and_set_bit_lock(CRYPTD_STATE_INUSE,
> > + &cpu_queue->state))
> > + return;
> > + }
> > +
> > + queued = queue_work(kcrypto_wq, &cpu_queue->work);
> > + BUG_ON(!queued);
> > +}
>
> It is unclear (to me) why this code is using the special "locked"
> bitops. This should be explained in a code comment.
>
> It isn't immediately clear (to me) what this function does, what its
> role is in the overall scheme. It wouldn't hurt at all to put a nice
> comment over non-trivial functions explaining such things.

This function is used for CRYPTD_STATE_INUSE logic, because
CRYPTD_STATE_INUSE is not needed, this function is not needed too. Just
calling queue_work() at end of cryptd_queue_work() is sufficient.

> > +static void cryptd_queue_work(struct work_struct *work)
> > +{
> > + struct cryptd_cpu_queue *cpu_queue =
> > + container_of(work, struct cryptd_cpu_queue, work);
>
> You could just do
>
> struct cryptd_cpu_queue *cpu_queue;
>
> cpu_queue = container_of(work, struct cryptd_cpu_queue, work);
>
> rather than uglifying the code to fit in 80-cols.

OK. I will fix this.

Best Regards,
Huang Ying

Attachment: signature.asc
Description: This is a digitally signed message part