Re: [PATCH] RISC-V: Don't check text_mutex during stop_machine

From: Palmer Dabbelt
Date: Sat May 22 2021 - 15:32:50 EST


On Thu, 06 May 2021 06:25:50 PDT (-0700), rostedt@xxxxxxxxxxx wrote:
> On Thu, 6 May 2021 00:10:41 -0700
> Palmer Dabbelt <palmer@xxxxxxxxxxx> wrote:
>
>> ---
>> In theory we should be able to avoid using stop_machine() with some
>> clever code sequences, but that's too big of a change to be considered a
>> fix. I also can't find the text I thought was in the ISA manual about
>> the allowed behaviors for concurrent modification of the instruction
>> stream, so I might have just mis-remembered that.
>> ---
>
> I wonder if you could at least use break points, as some other archs do,
> and what x86 does.
>
> If you have this make believe machine code:
>
> 00 00 00 00 nop
>
> And you want to turn it into a call.
>
> aa 12 34 56 bl ftrace_caller
>
> And if your architecture has a way to inject a break point on live code.
> Let's call this FF for the break point code.
>
> You can inject that first:
>
> FF 00 00 00

That would rely on this "concurrent writes to the instruction stream
don't tear" property that I thought we'd put in the ISA but I can't
actually find in the text of the document. That said, I hadn't actually
thought about replacing single bytes in the instruction stream. In
theory we don't have anything that guartees those won't tear, but I'm
not sure how any reasonable implementation would go about manifesting
those so it might be an easy retrofit to the ISA.

I ran through our breakpoint encodings and don't think those help, but
we do have 0x00000013 as a NOP (addi x0, x0, 0) which is one byte away
from 0x00000000 (which is defined to be an invalid instruction on all
RISC-V systems). There's really no difference between how a breakpoint
traps and how an illegal instruction traps on RISC-V, so we should be
able to use that to construct some sort of sequence.

So we can start with

nop
nop
nop
nop
nop

then do a single-byte write to turn it into

unimp
nop
nop
nop
nop

we can then IPI to all the harts in order to get them on the same page
about that trap, which we can then skip over. We'll need some way to
differentiate this from accidental executions of unimp, but we can just
build up a table somewhere (it wasn't immediately clear how x86 does
this). Then we have no ordering restrictions on converting the rest of
the stub into what's necessary to trace a function, which should look
the same as what we have now

unimp
save ra, ...(sp)
auipc ra, ...
jalr ra, ...
load ra, ...(sp)

At which point we can IPI everywhere to sync the instructions stream
again, before we turn off the trap

nop
save ra, ...(sp)
auipc ra, ...
jalr ra, ...
load ra, ...(sp)

> sync all CPUs where now all callers will hit this and jump to the break
> point handler, which simply does:
>
> ip = ip + 4;
> return;
>
> and returns back to the instruction after this nop/call.

There'll be some slight headaches there because the length of the all-0
unimplemented instruction depends on which extensions are supported, but
we can detect ILEN at runtime so that shouldn't be a showstopper.

> Change the rest of the instruction.
>
> FF 12 34 56
>
> sync all CPUs so that they all see this new instruction, but are still
> triggering the break point.
>
> Then finally remove the break point.
>
> aa 12 34 56
>
> And you just switched from the nop to the call without using stop machine.

OK, ya I think that all should work. We're still depending on this
"single byte writes to the instruction stream don't tear" guarantee, but
I think that's a bit pedantic. I'll open a question on the ISA spec
just to be sure one I can link here.

I'm probably not going to have the time to do this for a bit because I'm
buried in this non-coherent stuff, but if someone wants to take a shot
at it I'd be happy to take a look.

Thanks for pointing this out, this should get is away from
stop_machine() way sooner than waiting for a multi-byte modification
sequence would!