Re: Concerns about SFrame viability for userspace stack walking

From: Fangrui Song

Date: Thu Oct 30 2025 - 22:50:52 EST

On Thu, Oct 30, 2025 at 1:06 AM Jakub Jelinek <jakub@xxxxxxxxxx> wrote:
>
> On Thu, Oct 30, 2025 at 12:50:42AM -0700, Fangrui Song wrote:
> > An effective compact unwinding scheme needs to leverage ISA-specific properties.
>
> Having 40-50 completely different unwinding schemes, one for each
> architecture or even ISA subset, would be a complete nightmare. Plus the
> important property of DWARF is that it is easily extensible. So, I think it
> would be better to invent new DWARF DW_CFA_* arch specific opcodes which
> would be a shorthand for the most common sequences of unwind info, or allow
> the CIEs to define a library of DW_CFA_* sets perhaps with parameters which
> would then be usable in the FDEs. There are already some arch specific
> opcodes, DW_CFA_GNU_window_save for SPARC and
> DW_CFA_AARCH64_negate_ra_state_with_pc/DW_CFA_AARCH64_negate_ra_state for
> AArch64, but if somebody took time to look through .eh_frame of many
> binaries/libraries on several different distributions for particular arch
> (so that there is no bias in what exact options those distros use etc.) and
> found something that keeps repeating there commonly that could be shortened,
> perhaps the assembler or linker could rewrite sequences of specific .cfi_*
> directives into something equivalent but shorter once the extension opcodes
> are added. Though, there are only very few opcodes left, so taking them
> should be done with great care and at least one should be left as a
> multiplexer (single byte opcode followed by uleb128 code for further
> operation + arguments).
>
> Jakub

That's a good point about being careful with new unwind formats.
The LLVM compact unwind format, used by Mach-O, utilizes an
architecture-agnostic page table structure but has
architecture-specific opcode formats (i386, x86-64, and aarch64).
I.e. it does not introduce an entirely different format for each arch.

I believe the size issue with .eh_frame is primarily driven by the
CIE/FDE overhead, not the CFI instructions.
The inherently large size of a single FDE (around 20 bytes
https://discourse.llvm.org/t/rfc-improving-compact-x86-64-compact-unwind-descriptors/47471/10?u=maskray
) is a significant contributor to overall size.

The performance issue of .eh_frame seems largely related to the byte
code nature of the CFI instructions.
By encoding locations with different CFI states explicitly as
different frame entries makes it faster.