RE: [PATCH 1/1] riscv: prevent pipeline stall in __asm_to/copy_from_user

From: Palmer Dabbelt
Date: Sat Jun 12 2021 - 00:08:08 EST

Next message: Ira Weiny: "Re: [PATCH 09/16] ps3disk: use memcpy_{from,to}_bvec"
Previous message: Palmer Dabbelt: "Re: [PATCH -next] riscv: add VMAP_STACK overflow detection"
In reply to: David Laight: "RE: [PATCH 1/1] riscv: prevent pipeline stall in __asm_to/copy_from_user"
Next in thread: David Laight: "RE: [PATCH 1/1] riscv: prevent pipeline stall in __asm_to/copy_from_user"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 08 Jun 2021 04:31:40 PDT (-0700), David.Laight@xxxxxxxxxx wrote:

From: Akira Tsukamoto

Sent: 04 June 2021 10:57

Reducing pipeline stall of read after write (RAW).

These are the results from combination of the speedup with
Gary's misalign fix. Speeds up from 680Mbps to 900Mbps.

Before applying these two patches.

I think the changes should be in separate patches.
Otherwise it is difficult to see what is relevant.
It also looks as if there is a register rename.
Maybe that should be a precursor patch?

Yes, and I'd also prefer the original patches. This also doesn't apply.

...

I think this is the old main copy loop:

1:
- fixup REG_L, t2, (a1), 10f
- fixup REG_S, t2, (a0), 10f
- addi a1, a1, SZREG
- addi a0, a0, SZREG
- bltu a1, t1, 1b

and this is the new one:

3:
+ fixup REG_L a4, 0(a1), 10f
+ fixup REG_L a5, SZREG(a1), 10f
+ fixup REG_L a6, 2*SZREG(a1), 10f
+ fixup REG_L a7, 3*SZREG(a1), 10f
+ fixup REG_L t0, 4*SZREG(a1), 10f
+ fixup REG_L t1, 5*SZREG(a1), 10f
+ fixup REG_L t2, 6*SZREG(a1), 10f
+ fixup REG_L t3, 7*SZREG(a1), 10f
+ fixup REG_S a4, 0(t5), 10f
+ fixup REG_S a5, SZREG(t5), 10f
+ fixup REG_S a6, 2*SZREG(t5), 10f
+ fixup REG_S a7, 3*SZREG(t5), 10f
+ fixup REG_S t0, 4*SZREG(t5), 10f
+ fixup REG_S t1, 5*SZREG(t5), 10f
+ fixup REG_S t2, 6*SZREG(t5), 10f
+ fixup REG_S t3, 7*SZREG(t5), 10f
+ addi a1, a1, 8*SZREG
+ addi t5, t5, 8*SZREG
+ bltu a1, a3, 3b

I don't know the architecture, but unless there is a stunning
pipeline delay for memory reads a simple interleaved copy
may be fast enough.
So something like:
a = src[0];
do {
b = src[1];
src += 2;
dst[0] = a;
dst += 2;
a = src[0];
dst[-1] = b;
} while (src != src_end);
dst[0] = a;

It is probably worth doing benchmarks of the copy loop
in userspace.

I also don't know this microarchitecture, but this seems like a pretty wacky load-use delay.

Can we split out the misaligned handling fix to get that in sooner, that's likely the more urgent issue.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Next message: Ira Weiny: "Re: [PATCH 09/16] ps3disk: use memcpy_{from,to}_bvec"
Previous message: Palmer Dabbelt: "Re: [PATCH -next] riscv: add VMAP_STACK overflow detection"
In reply to: David Laight: "RE: [PATCH 1/1] riscv: prevent pipeline stall in __asm_to/copy_from_user"
Next in thread: David Laight: "RE: [PATCH 1/1] riscv: prevent pipeline stall in __asm_to/copy_from_user"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]