Re: [PATCH][2/2] SquashFS

From: Phillip Lougher
Date: Mon Mar 14 2005 - 21:32:03 EST

Next message: Andrew Morton: "Re: [PATCH 0/4] sparsemem intro patches"
Previous message: Benjamin Herrenschmidt: "Re: [swsusp/ppc] Re: What's going on here ?"
In reply to: Andrew Morton: "Re: [PATCH][2/2] SquashFS"
Next in thread: Andrew Morton: "Re: [PATCH][2/2] SquashFS"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tuesday, March 15, 2005, at 01:06 am, Andrew Morton wrote:

Phillip Lougher <phillip@xxxxxxxxxxxxxxxxxxx> wrote:

@@ -0,0 +1,439 @@

[lots of comments from patch 1/2 are applicable here]

OK. Noted :-)

+#define SQUASHFS_MAX_FILE_SIZE ((long long) 1 << \
+ (SQUASHFS_MAX_FILE_SIZE_LOG - 1))

1LL would suit here. Of a cast to loff_t.

+typedef unsigned int squashfs_block;
+typedef long long squashfs_inode;

squashfs_block_t and squashfs_inode_t, please. If one must use typedefs...

+typedef struct squashfs_super_block {
+ unsigned int s_magic;
+ unsigned int inodes;
+ unsigned int bytes_used;
+ unsigned int uid_start;
+ unsigned int guid_start;
+ unsigned int inode_table_start;
+ unsigned int directory_table_start;
+ unsigned int s_major:16;
+ unsigned int s_minor:16;
+ unsigned int block_size_1:16;
+ unsigned int block_log:16;
+ unsigned int flags:8;
+ unsigned int no_uids:8;
+ unsigned int no_guids:8;
+ unsigned int mkfs_time /* time of filesystem creation */;
+ squashfs_inode root_inode;
+ unsigned int block_size;
+ unsigned int fragments;
+ unsigned int fragment_table_start;
+} __attribute__ ((packed)) squashfs_super_block;

Whoa. Tons of bitfields in this file. Are these on-disk data structures?
If so, that's a problem for portability between architectures and possibly
compiler versions. It also introduces locking complexity.

if they're in-core data structures then the bitfields are probably slower than using `int', as well.

They look pretty nasty, but are quite harmless really...

The structures represent on-disk structures. Squashfs tries to cram as much information into
as small an area as possible on disk, which is why they're using bitfields.

The structures are read into memory from disk into the bit field structure, and the information
is immediately transferred to more sane 'int' structures inside the inode or into private
Squashfs data, and all reads/writes take place from there. No writes are made into the
bit fields, they're only used to temporarily 'parse' the packed data on disk.

I've done a lot of checking to ensure portability across architectures and against different
compiler versions. Gcc uniformly uses two representations for 'packed structures', one for
little endian architectures and one for big endian architectures. Little endian bitfield
structures are packed low-byte high byte order, allocating bitfields from low bit to high bit in ints.
Big endian structures are packed high-byte low-byte order, allocating bitfields from
high bit to low bit in ints (this incidently generates structures in the bit/byte order
specified in the C source). The filling is done this way on different endian architectures
as it allows the most efficient bit-field access code to be generated for each endian
architecture.

I've checked compatibilty against Intel 32 and 64 bit architectures, PPC 32/64 bit, ARM, MIPS
and SPARC. I've used compilers from 2.91.x upto 3.4...

+typedef struct {
+ unsigned int inode_type:4;
+ unsigned int mode:12; /* protection */
+ unsigned int uid:8; /* index into uid table */
+ unsigned int guid:8; /* index into guid table */
+} __attribute__ ((packed)) squashfs_base_inode_header;

See, if one CUP is modifying `inode_type' while another CPU is modifying
`mode', this struct can get trashed.

I agree. This is why the structures are never written to. Bit fields are slow, I move
the data out as soon as possible.

+/*
+ * macros to convert each packed bitfield structure from little endian to big
+ * endian and vice versa. These are needed when creating or using a filesystem
+ * on a machine with different byte ordering to the target architecture.
+ *
+ */

hmm, OK.. Tell us more?

As mentioned previously, there are two packed bit-field representations, one
for big endian machines, and one for little endian machines. Squashfs for
efficiency in embedded systems writes little endian filesystems (with little
endian bit field structures) for little endian targets, and big endian filesystems
for big endian targets. However, to allow non-native endian filesystems
(i.e. where the host is little endian but the target is big endian), to be mounted,
Squashfs will swap the filesystem on a different endian machine.

Squashfs at filesystem mount time determines if the filesystem is swapped with
respect to the host architecture. If it is then the packed bit-field structures
read off disk are in the wrong endianness. Immediately after reading off disk,
the structures are converted to the correct endianness for the architecture, and
are then processed as normal.

Due to the different bit-field filling rules between big endian and little
endian machines, bit fields are in different places within the structure
for each architecture, this means when coverting the endianness of
a structure the structure has to be converted as a whole. For each
bit field the macros are given the 'logical' position of the bit field and
use that to find the bit-field in the non-native structure using the non
native structure filling rules.

+ * bitfields and different bitfield placing conventions on differing
+ * architectures
+ */
+
+#include <asm/byteorder.h>
+
+#ifdef __BIG_ENDIAN
+ /* convert from little endian to big endian */
+#define SQUASHFS_SWAP(value, p, pos, tbits) _SQUASHFS_SWAP(value, p, pos, \
+ tbits, b_pos)
+#else
+ /* convert from big endian to little endian */
+#define SQUASHFS_SWAP(value, p, pos, tbits) _SQUASHFS_SWAP(value, p, pos, \
+ tbits, 64 - tbits - b_pos)
+#endif
+
+#define _SQUASHFS_SWAP(value, p, pos, tbits, SHIFT) {\
+ int bits;\
+ int b_pos = pos % 8;\
+ unsigned long long val = 0;\
+ unsigned char *s = (unsigned char *)p + (pos / 8);\
+ unsigned char *d = ((unsigned char *) &val) + 7;\
+ for(bits = 0; bits < (tbits + b_pos); bits += 8) \
+ *d-- = *s++;\
+ value = (val >> (SHIFT))/* & ((1 << tbits) - 1)*/;\
+}

Can the standard leXX_to_cpu() helpers not be used here?

No unfortunately not. The above hopefully describes why.

The swap macro is IMHO quite concise and efficient, the same macro
is used to swap from little-endian to big endian, and from big-endian to
little endian. The only difference is the _SQUASHFS_SWAP value which
either counts down from 64 bits to 0 (for high-bit low-bit filling order on
big endian machines), or counts up from 0 to 64 (for low-bit high-bit filling
order on little endian machines). For efficiency this value is determined
at compile time.

I believe doing the work another way would make the code more difficult to
understand and less efficient?

+#include <linux/squashfs_fs.h>
+
+typedef struct {
+ unsigned int block;
+ int length;
+ unsigned int next_index;
+ char *data;
+ } squashfs_cache;

Whitespace inconsistency (column 1 for the closing brace is standard)

--- linux-2.6.11.3/init/do_mounts_rd.c 2005-03-13 06:44:30.000000000 +0000
+++ linux-2.6.11.3-squashfs/init/do_mounts_rd.c 2005-03-14 00:53:28.092559728 +0000

Your changelog didn't mention that squashfs interacts with the boot
process. That's the sort of thing which is nice to tell people about.

Ok.

+SQUASHFS FILESYSTEM
+P: Phillip Lougher
+M: phillip@xxxxxxxxxxxxxxxxxxx
+W: http://squashfs.sourceforge.net
+L: squashfs-devel@xxxxxxxxxxxxxxxxxxxxx
+S: Maintained
+

Lots of little comments, but I have no fundamental problems with the
patches as long as the bitfield issue is shown to be a non-issue.

Please respin the patches and unless someone else sees a showstopper I'll
merge them into -mm for further testing and review, thanks.

Thanks

Phillip

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Morton: "Re: [PATCH 0/4] sparsemem intro patches"
Previous message: Benjamin Herrenschmidt: "Re: [swsusp/ppc] Re: What's going on here ?"
In reply to: Andrew Morton: "Re: [PATCH][2/2] SquashFS"
Next in thread: Andrew Morton: "Re: [PATCH][2/2] SquashFS"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]