Re: Concerns about SFrame viability for userspace stack walking

From: Indu

Date: Tue Nov 04 2025 - 04:27:34 EST

On 2025-10-29 11:53 p.m., Fangrui Song wrote:

I've been following the SFrame discussion and wanted to share some concerns about its viability for userspace adoption, based on concrete measurements and comparison with existing compact unwind implementations in LLVM.

**Size overhead concerns**

Measurements on a x86-64 clang binary show that .sframe (8.87 MiB) is approximately 10% larger than the combined size of .eh_frame and .eh_frame_hdr (8.06 MiB total).
This is problematic because .eh_frame cannot be eliminated - it contains essential information for restoring callee-saved registers, LSDA, and personality information needed for debugging (e.g. reading local variables in a coredump) and C++ exception handling.

This means adopting SFrame would result in carrying both formats, with a large net size increase.

**Learning from existing compact unwind implementations**

It's worth noting that LLVM has had a battle-tested compact unwind format in production use since 2009 with OS X 10.6, which transitioned to using CFI directives in 2013 [1]. The efficiency gains are dramatic:

__text section: 0x4a55470 bytes
__unwind_info section: 0x79060 bytes (0.6% of __text)
__eh_frame section: 0x58 bytes

I believe this is only synchronous? If yes, do you think this is a fair measurement to compare against ?

Does the compact unwind info scheme work well for cases of shrink-wrapping ? How about the case of AArch64, where the ABI does not mandate if and where frame record is created ?

For the numbers above, does it ensure precise stack traces ?

From the The Apple Compact Unwinding Format document (https://faultlore.com/blah/compact-unwinding/),
"One consequence of only having one opcode for a whole function is that functions will generally have incorrect instructions for the function’s prologue (where callee-saved registers are individually PUSHed onto the stack before the rest of the stack space is allocated)."

"Presumably this isn’t a very big deal, since there’s very few situations where unwinding would involve a function still executing its prologue/epilogue."

Well, getting precise stack traces is a big deal and the users want them.

(On macOS you can check the section size with objdump --arch x86_64 - h clang and dump the unwind info with objdump --arch x86_64 --unwind- info clang)

OpenVMS's x86-64 port, which is ELF-based, also adopted this format as documented in their "VSI OpenVMS Calling Standard" and their 2018 post: https://discourse.llvm.org/t/rfc-asynchronous-unwind-tables-attribute/59282

The compact unwind format achieves this efficiency through a two-level page table structure. It describes common frame layouts compactly and falls back to DWARF only when necessary, allowing most DWARF CFI entries to be eliminated while maintaining full functionality. For more details, see: https://faultlore.com/blah/compact-unwinding/ and the lld/MachO implemention https://github.com/llvm/llvm-project/blob/main/lld/MachO/ UnwindInfoSection.cpp

How does your vision of "linker-friendly" stack tracing/stack unwinding format reconcile with these suggested approaches ? As far as I can tell, these formats also require linker created indexes and are non-concatenable (custom handling in every linker). Something you've had "significant concerns" about.

From https://docs.vmssoftware.com/vsi-openvms-calling-standard/#STACK_UNWIND_EXCEPTION_X86_64:
"The unwind dispatch table (see Section B.3.1, ''Unwind Dispatch Table'') is created by the linker using information in the unwind descriptors (see Section B.3.2, ''DWARF Unwind Descriptors'' and Section B.3.3, ''Compact Unwind Description'') provided by compilers. The linker may use the provided unwind descriptors directly or replace them with equivalent optimized forms based on its optimization strategies."

Above all, do users want a solution which requires falling back on DWARF-based processing for precise stack tracing ?

**The AArch64 case: size matters even more**

The size consideration becomes even more critical for AArch64, which is heavily deployed on mobile phones.
There's an active feature request for compact unwind support in the AArch64 ABI: https://github.com/ARM-software/abi-aa/issues/344
This underscores the broader industry need for efficient unwind information that doesn't duplicate data or significantly increase binary size.

Our measurements with a dataset of about 1400 userspace artifacts (binaries and shared libraries) show that the SFrame/(EH Frame + EH Frame HDR) ratio is:
- Average of 0.70 on AArch64.
- Average of 1.00 on x86_64.

Projecting the size of what you observe for clang binary on x86_64 to conclude the size ratio on AArch64 is not very wise to do.

Whether the size impact is worth the benefit: its a choice for users to make. SFrame offers the users fast, precise stack traces with simple stack tracers.

There are at least two formats the ELF one can learn from: LLVM's compact unwind format (aarch64) and Windows ARM64 Frame Unwind Code.

Please, if you have any concrete suggestions (keeping the above goals in mind), you already know how/where to engage.

**Path forward**

Unless SFrame can actually replace .eh_frame (rather than supplementing it as an accelerator for linux-perf) and demonstrate sizes smaller than .eh_frame - matching the efficiency of existing compact unwind approaches — I question its practical viability for userspace.
The current design appears to add overhead rather than reduce it.
This isn't to suggest we should simply adopt the existing compact unwind format wholesale.
The x86-64 design dates back to 2009 or earlier, and there are likely improvements we can make. However, we should aim for similar or better efficiency gains.

For additional context, I've documented my detailed analysis at:

- https://maskray.me/blog/2025-09-28-remarks-on-sframe (covering mandatory index building problems, section group compliance and garbage collection issues, and version compatibility challenges)

GC issue is a bug currently tracked and with a target milestone of 2.46.

- https://maskray.me/blog/2025-10-26-stack-walking-space-and-time-trade- offs (size analysis)