Re: [PATCH v2 2/4] crypto: exynos - Improve performance of PRNG

From: Krzysztof Kozlowski
Date: Mon Dec 11 2017 - 09:54:47 EST


On Mon, Dec 11, 2017 at 3:06 PM, Åukasz Stelmach <l.stelmach@xxxxxxxxxxx> wrote:
> Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>, Bartlomiej Zolnierkiewicz <b.zolnierkie@xxxxxxxxxxx>

This should not appear here.

>
> Use memcpy_fromio() instead of custom exynos_rng_copy_random() function
> to retrieve generated numbers from the registers of PRNG.
>
> Rearrange the loop around cpu_relax(). In a loop with while() at the
> beginning and the cpu_relax() removed the retry variable is decremented
> twice (down to 98).

I had troubles with understanding this sentence... and then I figured
out that you are referring to some case without cpu_relax(). I do not
see how it is relevant to this case. Compare the new code with old,
not with some imaginary case without barriers (thus maybe reordered?).

Your solution is strictly performance oriented so it would be nice to
see here the exact difference in numbers justifying the change. But
only the change for while() -> do-while(), not mixed with
memcpy_fromio.

> This means, the status register is read three
> times before the hardware is ready (which is very quick). Apparently,
> cpu_relax() requires noticeable amount of time to execute, so it seems
> better to call it for the first time before checking the status of
> PRNG. The do {} while () loop decrements the retry variable only once,
> which means, the time required for cpu_relax() is long enough to for
> the PRNG to provide results.

So basically you want to say that you removed one call to exynos_rng_read()?

> On the other hand, performance in this
> arrangement isn't much worse than in a loop without cpu_relax().

I think it is not relevant as cpu_relax() removal is not part of
current nor the new code.

Best regards,
Krzysztof

>
> Signed-off-by: Åukasz Stelmach <l.stelmach@xxxxxxxxxxx>
> ---
> drivers/crypto/exynos-rng.c | 36 +++++-------------------------------
> 1 file changed, 5 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/crypto/exynos-rng.c b/drivers/crypto/exynos-rng.c
> index 2f554b82f49f..7d8f658480d3 100644
> --- a/drivers/crypto/exynos-rng.c
> +++ b/drivers/crypto/exynos-rng.c
> @@ -131,34 +131,6 @@ static int exynos_rng_set_seed(struct exynos_rng_dev *rng,
> }
>
> /*
> - * Read from output registers and put the data under 'dst' array,
> - * up to dlen bytes.
> - *
> - * Returns number of bytes actually stored in 'dst' (dlen
> - * or EXYNOS_RNG_SEED_SIZE).
> - */
> -static unsigned int exynos_rng_copy_random(struct exynos_rng_dev *rng,
> - u8 *dst, unsigned int dlen)
> -{
> - unsigned int cnt = 0;
> - int i, j;
> - u32 val;
> -
> - for (j = 0; j < EXYNOS_RNG_SEED_REGS; j++) {
> - val = exynos_rng_readl(rng, EXYNOS_RNG_OUT(j));
> -
> - for (i = 0; i < 4; i++) {
> - dst[cnt] = val & 0xff;
> - val >>= 8;
> - if (++cnt >= dlen)
> - return cnt;
> - }
> - }
> -
> - return cnt;
> -}
> -
> -/*
> * Start the engine and poll for finish. Then read from output registers
> * filling the 'dst' buffer up to 'dlen' bytes or up to size of generated
> * random data (EXYNOS_RNG_SEED_SIZE).
> @@ -180,9 +152,10 @@ static int exynos_rng_get_random(struct exynos_rng_dev *rng,
> EXYNOS_RNG_SEED_CONF);
> }
>
> - while (!(exynos_rng_readl(rng,
> - EXYNOS_RNG_STATUS) & EXYNOS_RNG_STATUS_RNG_DONE) && --retry)
> + do {
> cpu_relax();
> + } while (!(exynos_rng_readl(rng, EXYNOS_RNG_STATUS) &
> + EXYNOS_RNG_STATUS_RNG_DONE) && --retry);
>
> if (!retry)
> return -ETIMEDOUT;
> @@ -190,7 +163,8 @@ static int exynos_rng_get_random(struct exynos_rng_dev *rng,
> /* Clear status bit */
> exynos_rng_writel(rng, EXYNOS_RNG_STATUS_RNG_DONE,
> EXYNOS_RNG_STATUS);
> - *read = exynos_rng_copy_random(rng, dst, dlen);
> + *read = min_t(size_t, dlen, EXYNOS_RNG_SEED_SIZE);
> + memcpy_fromio(dst, rng->mem + EXYNOS_RNG_OUT_BASE, *read);
>
> return 0;
> }
> --
> 2.11.0
>