Re: [RFC PATCH] Crypto-engine support for parallel requests
From: Iuliana Prodan
Date:  Tue Jan 21 2020 - 09:20:33 EST
On 1/21/2020 12:00 PM, Corentin Labbe wrote:
> On Tue, Jan 21, 2020 at 01:32:29AM +0200, Iuliana Prodan wrote:
>> Added support for executing multiple requests, in parallel,
>> for crypto engine.
>> A no_reqs is initialized and set in the new
>> crypto_engine_alloc_init_and_set function.
>> Here, is also set the maximum size for crypto-engine software
>> queue (not hardcoded anymore).
>> On crypto_pump_requests the no_reqs is increased, until the
>> max_no_reqs is reached, and decreased on crypto_finalize_request,
>> or on error path (in case a prepare_request or do_one_request
>> operation was unsuccessful).
>>
>> Signed-off-by: Iuliana Prodan <iuliana.prodan@xxxxxxx>
>> ---
>>   crypto/crypto_engine.c  | 112 +++++++++++++++++++++++++++++++++---------------
>>   include/crypto/engine.h |  11 +++--
>>   2 files changed, 84 insertions(+), 39 deletions(-)
>>
>> diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
>> index eb029ff..5219141 100644
>> --- a/crypto/crypto_engine.c
>> +++ b/crypto/crypto_engine.c
>> @@ -14,6 +14,7 @@
>>   #include "internal.h"
>>   
>>   #define CRYPTO_ENGINE_MAX_QLEN 10
>> +#define CRYPTO_ENGINE_MAX_CONCURRENT_REQS 1
>>   
>>   /**
>>    * crypto_finalize_request - finalize one request if the request is done
>> @@ -22,32 +23,27 @@
>>    * @err: error number
>>    */
>>   static void crypto_finalize_request(struct crypto_engine *engine,
>> -			     struct crypto_async_request *req, int err)
>> +				    struct crypto_async_request *req, int err)
>>   {
>>   	unsigned long flags;
>> -	bool finalize_cur_req = false;
>> +	bool finalize_req = false;
>>   	int ret;
>>   	struct crypto_engine_ctx *enginectx;
>>   
>>   	spin_lock_irqsave(&engine->queue_lock, flags);
>> -	if (engine->cur_req == req)
>> -		finalize_cur_req = true;
>> +	if (engine->no_reqs > 0) {
>> +		finalize_req = true;
>> +		engine->no_reqs--;
>> +	}
>>   	spin_unlock_irqrestore(&engine->queue_lock, flags);
>>   
>> -	if (finalize_cur_req) {
>> -		enginectx = crypto_tfm_ctx(req->tfm);
>> -		if (engine->cur_req_prepared &&
>> -		    enginectx->op.unprepare_request) {
>> -			ret = enginectx->op.unprepare_request(engine, req);
>> -			if (ret)
>> -				dev_err(engine->dev, "failed to unprepare request\n");
>> -		}
>> -		spin_lock_irqsave(&engine->queue_lock, flags);
>> -		engine->cur_req = NULL;
>> -		engine->cur_req_prepared = false;
>> -		spin_unlock_irqrestore(&engine->queue_lock, flags);
>> +	enginectx = crypto_tfm_ctx(req->tfm);
>> +	if (finalize_req && enginectx->op.prepare_request &&
>> +	    enginectx->op.unprepare_request) {
>> +		ret = enginectx->op.unprepare_request(engine, req);
>> +		if (ret)
>> +			dev_err(engine->dev, "failed to unprepare request\n");
>>   	}
>> -
>>   	req->complete(req, err);
>>   
>>   	kthread_queue_work(engine->kworker, &engine->pump_requests);
>> @@ -73,8 +69,8 @@ static void crypto_pump_requests(struct crypto_engine *engine,
>>   
>>   	spin_lock_irqsave(&engine->queue_lock, flags);
>>   
>> -	/* Make sure we are not already running a request */
>> -	if (engine->cur_req)
>> +	/* Make sure we have space, for more requests to run */
>> +	if (engine->no_reqs >= engine->max_no_reqs)
>>   		goto out;
>>   
>>   	/* If another context is idling then defer */
>> @@ -108,13 +104,16 @@ static void crypto_pump_requests(struct crypto_engine *engine,
>>   		goto out;
>>   	}
>>   
>> +retry:
>>   	/* Get the fist request from the engine queue to handle */
>>   	backlog = crypto_get_backlog(&engine->queue);
>>   	async_req = crypto_dequeue_request(&engine->queue);
>>   	if (!async_req)
>>   		goto out;
>>   
>> -	engine->cur_req = async_req;
>> +	/* Increase the number of concurrent requests that are in execution */
>> +	engine->no_reqs++;
>> +
>>   	if (backlog)
>>   		backlog->complete(backlog, -EINPROGRESS);
>>   
>> @@ -130,7 +129,7 @@ static void crypto_pump_requests(struct crypto_engine *engine,
>>   		ret = engine->prepare_crypt_hardware(engine);
>>   		if (ret) {
>>   			dev_err(engine->dev, "failed to prepare crypt hardware\n");
>> -			goto req_err;
>> +			goto req_err_2;
>>   		}
>>   	}
>>   
>> @@ -141,26 +140,45 @@ static void crypto_pump_requests(struct crypto_engine *engine,
>>   		if (ret) {
>>   			dev_err(engine->dev, "failed to prepare request: %d\n",
>>   				ret);
>> -			goto req_err;
>> +			goto req_err_2;
>>   		}
>> -		engine->cur_req_prepared = true;
>>   	}
>>   	if (!enginectx->op.do_one_request) {
>>   		dev_err(engine->dev, "failed to do request\n");
>>   		ret = -EINVAL;
>> -		goto req_err;
>> +		goto req_err_1;
>>   	}
>> +
>>   	ret = enginectx->op.do_one_request(engine, async_req);
>>   	if (ret) {
>>   		dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret);
>> -		goto req_err;
>> +		goto req_err_1;
>>   	}
>> -	return;
>> -
>> -req_err:
>> -	crypto_finalize_request(engine, async_req, ret);
>> -	return;
>>   
>> +	/*
>> +	 * If there is still space for concurrent requests,
>> +	 * try and send a new one
>> +	 */
>> +	spin_lock_irqsave(&engine->queue_lock, flags);
>> +	if (engine->no_reqs < engine->max_no_reqs)
>> +		goto retry;
> 
> You should check if engine->queue.qlen > 0 before retry.
> 
>> +	goto out;
>> +
>> +req_err_1:
>> +	if (enginectx->op.unprepare_request) {
>> +		ret = enginectx->op.unprepare_request(engine, async_req);
>> +		if (ret)
>> +			dev_err(engine->dev, "failed to unprepare request\n");
>> +	}
>> +req_err_2:
>> +	async_req->complete(async_req, ret);
>> +	spin_lock_irqsave(&engine->queue_lock, flags);
>> +	/*
>> +	 * If unable to prepare or execute the request,
>> +	 * decrease the number of concurrent requests
>> +	 */
>> +	engine->no_reqs--;
>> +	goto retry;
> 
> You should check if engine->queue.qlen > 0 before retry.
> 
Here (and above) is not needed to check for qlen > 0, since on retry, 
first thing is tryin to dequeue an async_req from crypto-engine queue. 
In crypto_dequeue_request function is a check for qlen, that means than 
in pump_request will goto out.
>>   out:
>>   	spin_unlock_irqrestore(&engine->queue_lock, flags);
>>   }
>> @@ -386,15 +404,21 @@ int crypto_engine_stop(struct crypto_engine *engine)
>>   EXPORT_SYMBOL_GPL(crypto_engine_stop);
>>   
>>   /**
>> - * crypto_engine_alloc_init - allocate crypto hardware engine structure and
>> - * initialize it.
>> + * crypto_engine_alloc_init_and_set - allocate crypto hardware engine structure
>> + * and initialize it by setting the maximum number of entries in the software
>> + * crypto-engine queue and the maximum number of concurrent requests that can
>> + * be executed at once.
>>    * @dev: the device attached with one hardware engine
>>    * @rt: whether this queue is set to run as a realtime task
>> + * @max_no_reqs: maximum number of request that can be executed in parallel
>> + * @qlen: maximum size of the crypto-engine queue
>>    *
>>    * This must be called from context that can sleep.
>>    * Return: the crypto engine structure on success, else NULL.
>>    */
>> -struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
>> +struct crypto_engine *crypto_engine_alloc_init_and_set(struct device *dev,
>> +						       bool rt, int max_no_reqs,
>> +						       int qlen)
>>   {
>>   	struct sched_param param = { .sched_priority = MAX_RT_PRIO / 2 };
>>   	struct crypto_engine *engine;
>> @@ -411,12 +435,13 @@ struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
>>   	engine->running = false;
>>   	engine->busy = false;
>>   	engine->idling = false;
>> -	engine->cur_req_prepared = false;
>>   	engine->priv_data = dev;
>>   	snprintf(engine->name, sizeof(engine->name),
>>   		 "%s-engine", dev_name(dev));
>> +	engine->max_no_reqs = max_no_reqs;
>> +	engine->no_reqs = 0;
>>   
>> -	crypto_init_queue(&engine->queue, CRYPTO_ENGINE_MAX_QLEN);
>> +	crypto_init_queue(&engine->queue, qlen);
>>   	spin_lock_init(&engine->queue_lock);
>>   
>>   	engine->kworker = kthread_create_worker(0, "%s", engine->name);
>> @@ -433,6 +458,23 @@ struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
>>   
>>   	return engine;
>>   }
>> +EXPORT_SYMBOL_GPL(crypto_engine_alloc_init_and_set);
>> +
>> +/**
>> + * crypto_engine_alloc_init - allocate crypto hardware engine structure and
>> + * initialize it.
>> + * @dev: the device attached with one hardware engine
>> + * @rt: whether this queue is set to run as a realtime task
>> + *
>> + * This must be called from context that can sleep.
>> + * Return: the crypto engine structure on success, else NULL.
>> + */
>> +struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
>> +{
>> +	return crypto_engine_alloc_init_and_set(dev, rt,
>> +						CRYPTO_ENGINE_MAX_CONCURRENT_REQS,
>> +						CRYPTO_ENGINE_MAX_QLEN);
>> +}
>>   EXPORT_SYMBOL_GPL(crypto_engine_alloc_init);
>>   
>>   /**
>> diff --git a/include/crypto/engine.h b/include/crypto/engine.h
>> index e29cd67..5f9a6df 100644
>> --- a/include/crypto/engine.h
>> +++ b/include/crypto/engine.h
>> @@ -24,7 +24,6 @@
>>    * @idling: the engine is entering idle state
>>    * @busy: request pump is busy
>>    * @running: the engine is on working
>> - * @cur_req_prepared: current request is prepared
>>    * @list: link with the global crypto engine list
>>    * @queue_lock: spinlock to syncronise access to request queue
>>    * @queue: the crypto queue of the engine
>> @@ -38,14 +37,14 @@
>>    * @kworker: kthread worker struct for request pump
>>    * @pump_requests: work struct for scheduling work to the request pump
>>    * @priv_data: the engine private data
>> - * @cur_req: the current request which is on processing
>> + * @max_no_reqs: maximum number of request which can be processed in parallel
>> + * @no_reqs: current number of request which are processed in parallel
>>    */
>>   struct crypto_engine {
>>   	char			name[ENGINE_NAME_LEN];
>>   	bool			idling;
>>   	bool			busy;
>>   	bool			running;
>> -	bool			cur_req_prepared;
>>   
>>   	struct list_head	list;
>>   	spinlock_t		queue_lock;
>> @@ -61,7 +60,8 @@ struct crypto_engine {
>>   	struct kthread_work             pump_requests;
>>   
>>   	void				*priv_data;
>> -	struct crypto_async_request	*cur_req;
>> +	int			max_no_reqs;
>> +	int			no_reqs;
>>   };
>>   
>>   /*
>> @@ -102,6 +102,9 @@ void crypto_finalize_skcipher_request(struct crypto_engine *engine,
>>   int crypto_engine_start(struct crypto_engine *engine);
>>   int crypto_engine_stop(struct crypto_engine *engine);
>>   struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt);
>> +struct crypto_engine *crypto_engine_alloc_init_and_set(struct device *dev,
>> +						       bool rt, int max_no_reqs,
>> +						       int qlen);
>>   int crypto_engine_exit(struct crypto_engine *engine);
>>   
>>   #endif /* _CRYPTO_ENGINE_H */
>> -- 
>> 2.1.0
>>
> 
> Hello
> 
> In your model, who is running finalize_request() ?
finalize_request() in CAAM, and in other drivers, is called on the _done 
callback (stm32, virtio and omap).
> In caam it seems that you have a taskqueue dedicated for that but you cannot assume that all drivers will have this.
> I think the crypto_engine should be sufficient by itself and does not need external thread/taskqueue.
> 
> But in your case, it seems that you dont have the choice, since do_one_request does not "do" but simply enqueue the request in the "jobring".
> 
But, do_one_request it shouldn't, necessary,  execute the request. Is ok 
to enqueue it, since we have asynchronous requests. do_one_request is 
not blocking.
> What about adding along prepare/do_one_request/unprepare a new enqueue()/can_do_more() function ?
> 
> The stream will be:
> retry:
> optionnal prepare
> optionnal enqueue
> optionnal can_do_more() (goto retry)
> optionnal do_one_request
> 
> then
> finalize()
> optionnal unprepare
> 
I'm planning to improve crypto-engine incrementally, so I'm taking one 
step at a time :)
But I'm not sure if adding an enqueue operation is a good idea, since, 
my understanding, is that do_one_request is a non-blocking operation and 
it shouldn't execute the request.
IMO, the crypto-engine flow should be kept simple:
1. a request comes to hw -> this is doing transfer_request_to_engine;
2. CE enqueue the requests
3. on pump_requests:
	3. a) optional prepare operation
	3. b) sends the reqs to hw, by do_one_request operation. To wait for 
completion here it contradicts the asynchronous crypto API. 
do_one_request operation has a crypto_async_request type as argument.
Note: Step 3. b) can be done several times, depending on size of hw queue.
4. in driver, when req is done:
	4. a) optional unprepare operation
	4. b) crypto_finalize_request is called
	
Thanks,
Iulia
> The can_do_more simply will tell if we can enqueue more (this will handle your case(ringjob), and my case(batching)
> Instead of storing the limit in the crypto_engine, you keep control on the driver side.
> 
> For your case the do_one_request will be unset, for mine I will use to ran the batch.
> But for other drivers, no change will be necessary (appart adding some enqueue=NULL,can_do_more=NULL).
> 
> We can also imagine an easier solution like enqueue returning a positive value saying to queue more.
> 
> Regards
>