Re: [RFC][PATCH 6/9] gen_initramfs_list.sh: include xattrs

From: Rob Landley
Date: Tue Jan 13 2015 - 13:48:43 EST


On 01/08/2015 04:08 PM, Mimi Zohar wrote:
> On Thu, 2015-01-08 at 12:19 -0600, Rob Landley wrote:
>>
>> But I am curious about how you propose to encode xattrs into the cpio
>> format. (Which Al Viro chose because it's _simple_. There isn't really
>> a
>> controlling spec since Posix decided to deprecated it in 2001 and
>> yank
>> it from SUSv3 onwards. LSB extended several header fields to 8 hex
>> digits instead of 6, but they still have 32 bit timestamps which seems
>> a
>> bit short-sighted. If you're going to define a new rev with a new
>> magic
>> number, there are a couple other things you might wanna fix...)
>
> Sounds like a good opportunity to make the other changes as well. We
> can include the other changes in this patch set. Is this (initramfs)
> the right mailing list for this discussion?

I'd forgotten there was such a list until the email came in. :)

> Do other people need to be included?

In theory including the Austin Group (the posix committee mailing list)
might be useful, but in practice they hear about stuff well after the
fact, and they washed their hands of cpio over a decade ago (shortly
after Linux started heavily using it in rpm, and a bit before initramfs).

I note that there are two data formats of interest here:

1) the cpio file layout.

2) the list of files generated by gen_initramfs_list.sh and consumed by
gen_init_cpio.

The fact you're modifying gen_initramfs_list.sh seems to imply that
you're changing the _second_ format as well as the first...? The second
was never actually specified, but it does get used a lot by various
build systems and breaking it would inconvenience people. (Plus I'd need
to update the documentation, but I need to do that anyway.)

Ss long as you're extending the format could you add a second 32 bits of
time data that gets glued to the top half of a uint64_t, and then store
the actual time value in microseconds (so time*1000000)? (I'd say
"nanoseconds" but 63 bits of nanoseconds is 292 years, which is just
short enough I'm uncomfortable with it. I'm just optimistic enough to
think that might inconvenience somebody.)

The other "huh" is the filesize, but 4 gigs per file seems unlikely to
be more than initramfs needs any time soon? (It's possible that RPM
might care in 15 years or so...)

>> I ask because I maintain a new from-scratch cpio implementation
>> (http://landley.net/hg/toybox/file/1571/toys/posix/cpio.c), so I'd
>> presumably have to add your format extensions to this. Is there any
>> sort
>> of documentation on them?
>>
>> The toybox config Android is using has this cpio implementation
>> enabled
>> (see
>> https://android.googlesource.com/platform/external/toybox/+/9250c95a8c47/Android.mk)
>> so I'd rather like to get this sort of detail right...
>
> The xattr section, which follows the file name, is of the format:
> <number of xattrs> { <xattr name> <xattr data size> <xattr data> } for
> each xattr, terminated with a NULL byte and padded to a 4 byte boundary.

where <value> is... 8 bytes of ascii hex digits like the header values?
(Every cpio string is padded to a boundary. Sigh, lemme go read your
patch...)

Ok, 2/9, actual file format parsing. New magic string at the start
"070703" for the new version. (Good, that was my first question: easy
way to distinguish this from the previous format).

- for (i = 0, s += 6; i < 12; i++, s += 8) {
+ for (i = 0; i < (!newcx ? 12 : 13); i++, s += 8) {

You've tested this and the missing s+= 6 didn't cause problems? (Or did
it move somewhere else...? Is that what the 1/9 did grabbing just the
magic instead of the rest of the header data...?)

> The header contains an additional field, before the checksum, containing
> the xattr section length, including the NULL byte, but without the
> padding.

Ah, the old "4 bytes of padding to align to 4 bytes" silliness. (Even
though you can't trust the null to _be_ there and have to set it
yourself after the read.) I'm starting to remember this...

Ok, different header magic, one new 8 hex digit field at the end of the
header (before crc) containing "xattr_bufflen". The start of this buffer
is an 8 digit hex "num_xattrs", which you iterate through and call
strlen() on despite never having assured that the data you read in
actually _does_ contain a null at the end (of the entire buffer). Then
past that supposed null is another 8 digit hex "xattr_value_size", and
that many bytes following you then send to sys_setxattr().

Except for the part about you trusting your input data waaaaay too much,
seems reasonable? I have no idea what sys_setxattr() accepts, but
presumably there's a man page for the system call...

http://man7.org/linux/man-pages/man2/setxattr.2.html

Ok, that's probably enough data to implement it. (Not sure why that man
page isn't in my ubuntu 14.04 install which has manpages-dev installed?

$ man setxattr
No manual entry for setxattr

> Note that gen_init_cpio does not include "security.evm" as it is file
> system dependent.

I have no idea what that is. Should I not include it in the command line
cpio? (Or have a flag?)

The last time I used extended attributes was on OS/2; my only
non-academic interaction with selinux has been looking up how to switch
it off.

I still boggle that Fortune 500 bureaucracies include "must have a
security system designed by the NSA based on the idea of complicating
the system until there are no obvious holes, because after the Snowden
leaks that's clearly a good idea" as part of their certification
processes for reducing the risk of being unable to delegate blame.

I'm also kind of impressed by the longevity of a hack the original Apple
Lisa developers invented in 1981 because their OS didn't have an
equivalent of the unix "file" command, and undoes the central unix
"everything's a flat file" idea to bring us back to the structure
records with magic meanings of the mainframe days.

http://www.folklore.org/StoryView.py?project=Macintosh&story=The_Grand_Unified_Model_The_Finder.txt&sortOrder=Sort+by+Title

Still, dictating what users can and can't do is policy. It exists,
people use it... sigh. I'll try to take a stab at it some evening this week.

> Mimi

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/