Re: [PATCH 01/16] riscv: Introduce instruction table generation
From: Charlie Jenkins
Date: Wed Jun 10 2026 - 21:06:18 EST
On Wed, Jun 10, 2026 at 05:56:25PM +0200, Nam Cao wrote:
> Charlie Jenkins via B4 Relay
> <devnull+thecharlesjenkins.gmail.com@xxxxxxxxxx> writes:
> > From: Charlie Jenkins <thecharlesjenkins@xxxxxxxxx>
> >
> > Eliminate the need to hand-write riscv instructions by using a shell
> > script to autogenerate a header from an instruction table. This is modeled
> > after the syscall table infrastructure.
> >
> > The table is generated externally by riscv-unified-db [1], but is
> > in a simple format to make it possible to use other tools or modify
> > manually.
> >
> > [1] https://github.com/riscv-software-src/riscv-unified-db
> >
> > Signed-off-by: Charlie Jenkins <thecharlesjenkins@xxxxxxxxx>
>
> Thanks for the work, I really like the idea. This will make it much
> easier to maintain the instruction stuffs.
>
> > +c.ld common,32 011<13|00<0 imm<3=6-5|12-10 xd!1!3!5!7=4-2 xs1=9-7
> > +c.ld common,64 011<13|00<0 imm<3=6-5|12-10 xd=4-2 xs1=9-7
>
> Not sure if I confuse something, but the spec says "C.LD is an
> RV64C-only instruction". Why do we have 32 here?
This is a weird one. The Ziclsd extension introduces it for RV32[1]. All
of the data is generated from the riscv-unified-db and because it is in
the Ziclsd extension, c.ld is included for 32-bit in the c.ld
description [2].
[1] https://docs.riscv.org/reference/isa/extensions/zilsd/_attachments/riscv-zilsd.pdf
[2] https://github.com/riscv/riscv-unified-db/blob/main/spec/std/isa/inst/C/c.ld.yaml
>
> > +echo "#define COMMA ," >> $outfile
> > +echo "#define SEMICOLON ;" >> $outfile
> > +echo "#define SINGLE_ARG(...) __VA_ARGS__" >> $outfile
>
> Aren't these macro unused?
Yes thanks, I had them for an earlier version and never removed them.
>
> > +echo >> $outfile
> > +
> > +grep -E "^[a-z\.0-9]+[[:space:]]+" "$infile" | {
> > + while read name base fixed variables; do
> > + echo "/* $name */"
> > +
> > + compressed_name=${name##c.*}
> ^^^^^^^^^^^^^^^
> this name is misleading
That's fair, I can rename it to be something like "compressed_inst"?
>
> > + invalid_inst_functions=""
> > + variable_params=""
> > + constraints=""
> > + match=""
> > + mask=""
> > + make=""
> > +
> > + # All compressed instructions start with "c."
> > + size=${compressed_name:+32};
> > + size=${size:-16};
> > +
> > + # Replace all . with _
> > + formatted_inst_name=$name
> > + while [ ! ${formatted_inst_name##*.*} ]; do
> > + prefix=${formatted_inst_name%.*}
> > + suffix=${formatted_inst_name##*.}
> > + contains_dot=${formatted_inst_name##*.*}
> > + formatted_inst_name=${contains_dot:-${prefix}_${suffix}}
> > + done
>
> Does the simplier
> formatted_inst_name=$(echo $name | tr '.' '_')
> work?
That does work, but it dramatically slows down the time. I was trying to
avoid using external programs because this is called on every
compilation and there are a lot of instructions to parse. On my system,
it's about 10x slower to use echo/tr. Taking the time from about 150us
to 1.5ms for each iteration and the total time from around 0.8s to
around 3.5s.
>
> > + echo "static __always_inline ${type}${size} riscv_insn_${formatted_inst_name}_extract_${variable_name}(u${size} ${insn})"
> > + echo "{"
> > + echo "\treturn ${extract};"
> > + echo "}"
> > + echo "static __always_inline void riscv_insn_${formatted_inst_name}_insert_${variable_name}(u${size} *${insn}, ${type}32 ${var})"
> > + echo "{"
> > + echo "\t*_insn &= ${insert_mask# & };"
>
> Why is this required? Isn't this part always zero at this point?
>
> > + echo "\t*_insn |= ${insert# | };"
> > + echo "}"
> > +
> > + if [ "${only_base}" ]; then
> > + invalid_inst_functions="${invalid_inst_functions}static __always_inline ${type}${size} riscv_insn_${formatted_inst_name}_extract_${variable_name}(u${size} ${insn}) {\n\tpanic(\"${name} is not supported on non ${only_base}-bit systems.\");\n}\n"
>
> Instead of panic(), can we do BUILD_BUG() instead?
That's a better solution :)
- Charlie
>
> Nam