Yeah, and it's actually bothering me quite a lot 🙂 I have a tentativeSFrame specification aims to keep SFrame header and SFrame FDE membersRight, but the spec says they're all packed. Maybe the point is that+struct sframe_preamble {I couldn't understand from SFrame itself, but why do sframe_header,
+ u16 magic;
+ u8 version;
+ u8 flags;
+} __packed;
+
+struct sframe_header {
+ struct sframe_preamble preamble;
+ u8 abi_arch;
+ s8 cfa_fixed_fp_offset;
+ s8 cfa_fixed_ra_offset;
+ u8 auxhdr_len;
+ u32 num_fdes;
+ u32 num_fres;
+ u32 fre_len;
+ u32 fdes_off;
+ u32 fres_off;
+} __packed;
+
+struct sframe_fde {
+ s32 start_addr;
+ u32 func_size;
+ u32 fres_off;
+ u32 fres_num;
+ u8 info;
+ u8 rep_size;
+ u16 padding;
+} __packed;
sframe_preamble, and sframe_fde have to be marked __packed, if it's
all naturally aligned (intentionally and by design)?..
some future sframe version is free to introduce unaligned fields.
at aligned boundaries in future versions.
Only SFrame FRE related accesses may have unaligned accesses.
proposal, maybe we can discuss this for SFrame v3? Let me briefly
outline the idea.
So, currently in v2, FREs within FDEs use an array-of-structs layout.
If we use preudo-C type definitions, it would be something like this
for FDE + its FREs:
struct FDE_and_FREs {
struct sframe_func_desc_entry fde_metadata;
union FRE {
struct FRE8 {
u8 sfre_start_address;
u8 sfre_info;
u8|u16|u32 offsets[M];
}
struct FRE16 {
u16 sfre_start_address;
u16 sfre_info;
u8|u16|u32 offsets[M];
}
struct FRE32 {
u32 sfre_start_address;
u32 sfre_info;
u8|u16|u32 offsets[M];
}
} fres[N] __packed;
};
where all fres[i]s are one of those FRE8/FRE16/FRE32, so start
addresses have the same size, but each FRE has potentially different
offsets sizing, so there is no common alignment, and so everything has
to be packed and unaligned.
But what if we take a struct-of-arrays approach and represent it more like:
struct FDE_and_FREs {
struct sframe_func_desc_entry fde_metadata;
u8|u16|u32 start_addrs[N]; /* can extend to u64 as well */
u8 sfre_infos[N];
u8 offsets8[M8];
u16 offsets16[M16] __aligned(2);
u32 offsets32[M32] __aligned(4);
/* we can naturally extend to support also u64 offsets */
};
i.e., we split all FRE records into their three constituents: start
addresses, info bytes, and then each FRE can fall into either 8-, 16-,
or 32-bit offsets "bucket". We collect all the offsets, depending on
their size, into these aligned offsets{8,16,32} arrays (with natural
extension to 64 bits, if necessary), with at most wasting 1-3 bytes to
ensure proper alignment everywhere.
Note, at this point we need to decide if we want to make FREs binary
searchable or not.
If not, we don't really need anything extra. As we process each
start_addrs[i] and sfre_infos[i] to find matching FRE, we keep track
of how many 8-, 16-, and 32-bit offsets already processed FREs
consumed, and when we find the right one, we know exactly the starting
index within offset{8,16,32}. Done.
But if we were to make FREs binary searchable, we need to basically
have an index of offset pointers to quickly find offsetsX[j] position
corresponding to FRE #i. For that, we can have an extra array right
next to start_addrs, "semantically parallel" to it:
u8|u16|u32 start_addrs[N];
u8|u16|u32 offset_idxs[N];
where start_addrs[i] corresponds to offset_idxs[i], and offset_idxs[i]
points to the first offset corresponding to FRE #i in offsetX[] array
(depending on FRE's "bitness"). This is a bit more storage for this
offset index, but for FDEs with lots of FREs this might be a
worthwhile tradeoff.
Few points:
a) we can decide this "binary searchability" per-FDE, and for FDEs
with 1-2-3 FREs not bother, while those with more FREs would be
searchable ones with index. So we can combine both fast lookups,
natural alignment of on-disk format, and compactness. The presence of
index is just another bit in FDE metadata.
b) bitness of offset_idxs[] can be coupled with bitness of
start_addrs (for simplicity), or could be completely independent and
identified by FDE's metadata (2 more bits to define this just like
start_addr bitness is defined). Independent probably would be my
preference, with linker (or whoever will be producing .sframe data)
can pick the smallest bitness that is sufficient to represent
everything.
Yes, it's a bit more complicated to draw and explain, but everything
will be nicely aligned, extensible to 64 bits, and (optionally at
least) binary searchable. Implementation-wise on the kernel side it
shouldn't be significantly more involved. Maybe the compiler would
need to be a bit smarter when producing FDE data, but it's no rocket
science.
Thoughts?