In fs/cramfs/inode.c (cramfs_read), each buffer holds 4 blocks
(or however big PAGE_CACHE_SIZE is, always 1 page in 2.3.38; I'll use
the term "block").
buffer_blocknr array holds the block number of the first of these blocks
in the buffer.
However, when testing whether we have the requested block cached, we
only test whether blocknr matches the first blocknr in buffer i.
This means that we're wasting at least half the buffer: only the first
one-and-a-bit out of 4 of the PAGE_CACHE_SIZE chunks will ever be
used.
I've changed cramfs_read to take an argument saying how many bytes are
required (which is usually less than PAGE_CACHE_SIZE) and returning a
pointer to the existing buffer if it already has enough.
I also added a sanity check that the requested length is within the
buffer size. Handling of the check failing is somewhat suckful, but
it "never happens" if the machine behaves according to spec.
Lower down, in the copy loop, the !bh case is ignored, which can lead
to garbage being read (or rather the previous contents of the cache,
which could be even worse, security wise).
I added a memset for this case.
Also in that loop, blocknr is incremented, but never used again. Removed.
The validation on super.root.offset was wrong. mkcramfs always sets
this to either zero (for an empty filesystem) or sizeof(struct
cramfs_super)>>2. mkcramfs doesn't set super.size to anything
meaningful at all.
The statfs manpage says that fields that are undefined for a
particular filesystem are set to -1, so the memset in cramfs_statfs
should use 0xff. (Assumes 2's complement.)
i_ino is calculated as (diskinode->offset ? diskinode->offset << 2 :
1) (see CRAMINO in inode.c). Only symlinks, non-empty regular files
and non-empty directories have a non-zero value of diskinode->offset,
so we're breaking the promise that <st_dev, st_ino> uniquely identify
a file (particularly for non-regular files). I suggest that we use
inode->u.cram_i.offset to hold (diskinode->offset << 2), and store the
inode's own location (perhaps divided by 4 or 4+sizeof(struct
cramfs_inode)) in i_ino. This is enough to fulfil the promise made by
the stat man page (also required by Single Unix Spec).
I haven't implemented this, but it's just a matter of defining struct
cram_inode_info, inserting it into the union in <linux/fs.h>, and
changing the CRAMINO and OFFSET macros. get_cramfs_inode would also
need to be passed the inode's own offset for purposes of calculating
i_ino.
I added support for holes, since this is just a matter of inserting
the single line "if (compr_len)" into cramfs_readpage (and adding some
code to mkcramfs). By default, mkcramfs doesn't create holes, so as
to remain compatible with existing kernels.
Added a short makefile for mkcramfs, since it requires non-obvious
compile flags -lz and -I../../fs/cramfs.
Fix various potential memory problems in mkcramfs. (E.g. checking
malloc return value, freeing afterwards, fix buffer overrun.)
The mmap usage is still imperfect: it's very easy to get ENOMEM or
EMFILE. Why do we use MAP_PRIVATE|MAP_ANONYMOUS followed by write,
instead of MAP_SHARED and a couple of ftruncate calls? Is it just so
that outfile can be a block device? If so, maybe I'll make it try
MAP_SHARED first and fall back to the current scheme. (The change
would also require cleaning up outfile on errors.)
I added calculation of an upper bound on required fs size, which
is just to reduce the ENOMEM problem.
mkcramfs now warns user when we have to truncate something to fit
cramfs' limits.
Changed maximum filesystem size that mkcramfs will create from 64MB to
a bit over 256MB. I don't know whether the 64MB "limit" pertains to a
property of ROMs or whether it's a miscalculation from the fact that
diskinode->offset is 26 bits wide, but cramfs is useful not just for
ROMs. I've tested that >64MB cramfs filesystems can be mounted & used.
There are other problems that I haven't addressed; see the end of
fs/cramfs/README in this patch.
pjm.
diff -durn linux-2.3.39/Documentation/filesystems/cramfs.txt linux-bld/Documentation/filesystems/cramfs.txt
--- linux-2.3.39/Documentation/filesystems/cramfs.txt Mon Dec 27 21:16:57 1999
+++ linux-bld/Documentation/filesystems/cramfs.txt Mon Jan 10 10:19:58 2000
@@ -11,3 +11,43 @@
You can't write to a cramfs filesystem (making it compressible and
compact also makes it _very_ hard to update on-the-fly), so you have to
create the disk image with the "mkcramfs" utility in scripts/cramfs.
+
+
+Usage Notes
+-----------
+
+File sizes are limited to less than 16MB.
+
+Maximum filesystem size is a little over 256MB. (The last file on the
+filesystem is allowed to extend past 256MB.) (Comments in mkcramfs.c
+suggest that ROM sizes may be limited to 64MB, though that's not a
+limitation in cramfs code.)
+
+Only the low 8 bits of gid are stored. The current version of
+mkcramfs simply truncates to 8 bits, which is a potential security
+issue.
+
+Hard links are not supported, but symlinks are. (See also the TODO
+comment in mkcramfs.c at the nlink test.)
+
+Cramfs directories have no `.' or `..' entries. Directories (like
+every other file on cramfs) always have a link count of 1. (There's
+no need to use -noleaf in `find', btw.)
+
+No timestamps are stored in a cramfs, so these default to the epoch
+(1970 GMT). Recently-accessed files may have updated timestamps, but
+the update lasts only as long as the inode is cached in memory, after
+which the timestamp reverts to 1970, i.e. moves backwards in time.
+
+Currently, cramfs must be written and read with architectures of the
+same endianness, and can be read only by kernels with PAGE_CACHE_SIZE
+== 4096. At least the latter of these is a bug, but it hasn't been
+decided what the best fix is. For the moment if you have larger pages
+you can just change the #define in mkcramfs.c, so long as you don't
+mind the filesystem becoming unreadable to future kernels.
+
+
+Hacker Notes
+------------
+
+See fs/cramfs/README for filesystem layout and implementation notes.
diff -durn linux-2.3.39/fs/cramfs/README linux-bld/fs/cramfs/README
--- linux-2.3.39/fs/cramfs/README Thu Jan 1 00:00:00 1970
+++ linux-bld/fs/cramfs/README Tue Jan 11 11:17:56 2000
@@ -0,0 +1,166 @@
+Notes on Filesystem Layout
+--------------------------
+
+These notes describe what mkcramfs generates. Kernel requirements are
+a bit looser, e.g. it doesn't care if the <file_data> items are
+swapped around (though it does care that directory entries (inodes) in
+a given directory are contiguous, as this is used by readdir).
+
+All data is in host-endian format; neither mkcramfs nor the kernel
+ever do swabbing. (See section `Block Size' below.)
+
+<filesystem>:
+ <superblock>
+ <directory_structure>
+ <data>
+
+<superblock>: struct cramfs_super (see cramfs.h).
+
+<directory_structure>:
+ For each file:
+ struct cramfs_inode (see cramfs.h).
+ Filename. Not generally null-terminated, but it is
+ null-padded to a multiple of 4 bytes.
+
+The order of inode traversal is described as "width-first" (not to be
+confused with breadth-first); i.e. like depth-first but listing all of
+a directory's entries before recursing down its subdirectories: the
+same order as `ls -AUR' (but without the /^\..*:$/ directory header
+lines); put another way, the same order as `find -type d -exec
+ls -AU1 {} \;'.
+
+<data>:
+ One <file_data> for each file that's either a symlink or a
+ regular file of non-zero st_size.
+
+<file_data>:
+ nblocks * <block_pointer>
+ (where nblocks = (st_size - 1) / blksize + 1)
+ nblocks * <block>
+ padding to multiple of 4 bytes
+
+The i'th <block_pointer> for a file stores the byte offset of the
+*end* of the i'th <block> (i.e. one past the last byte, which is the
+same as the start of the (i+1)'th <block> if there is one). The first
+<block> immediately follows the last <block_pointer> for the file.
+<block_pointer>s are each 32 bits long.
+
+The order of <file_data>'s is a depth-first descent of the directory
+tree, i.e. the same order as `find -size +0 \( -type f -o -type l \)
+-print'.
+
+
+<block>: The i'th <block> is the output of zlib's compress function
+applied to the i'th blksize-sized chunk of the input data.
+(For the last <block> of the file, the input may of course be smaller.)
+Each <block> may be a different size. (See <block_pointer> above.)
+<block>s are merely byte-aligned, not generally u32-aligned.
+
+
+Holes
+-----
+
+This kernel supports cramfs holes (i.e. [efficient representation of]
+blocks in uncompressed data consisting entirely of NUL bytes), but by
+default mkcramfs doesn't test for & create holes, since cramfs in
+kernels up to at least 2.3.39 didn't support holes. Compile mkcramfs
+with -DDO_HOLES if you want it to create files that can have holes in
+them.
+
+
+Tools
+-----
+
+If you're hacking on cramfs, you might find useful some tools for
+testing cramfs at <http://cvs.bofh.asn.au/cramfs/>, including a
+rudimentary fsck for cramfs.
+
+
+Future Development
+==================
+
+Block Size
+----------
+
+(Block size in cramfs refers to the size of input data that is
+compressed at a time. It's intended to be somewhere around
+PAGE_CACHE_SIZE for cramfs_readpage's convenience.)
+
+The superblock ought to indicate the block size that the fs was
+written for, since comments in <linux/pagemap.h> indicate that
+PAGE_CACHE_SIZE may grow in future (if I interpret the comment
+correctly).
+
+Currently, mkcramfs #define's PAGE_CACHE_SIZE as 4096 and uses that
+for blksize, whereas Linux-2.3.39 uses its PAGE_CACHE_SIZE, which in
+turn is defined as PAGE_SIZE (which can be as large as 32KB on arm).
+This discrepancy is a bug, though it's not clear which should be
+changed.
+
+One option is to change mkcramfs to take its PAGE_CACHE_SIZE from
+<asm/page.h>. Personally I don't like this option, but it does
+require the least amount of change: just change `#define
+PAGE_CACHE_SIZE (4096)' to `#include <asm/page.h>'. The disadvantage
+is that the generated cramfs cannot always be shared between different
+kernels, not even necessarily kernels of the same architecture if
+PAGE_CACHE_SIZE is subject to change between kernel versions.
+
+
+The remaining options try to make cramfs more sharable.
+
+One part of that is addressing endianness. The two options here are
+`always use little-endian' (like ext2fs) or `writer chooses
+endianness; kernel adapts at runtime'. Little-endian wins because of
+code simplicity and little CPU overhead even on big-endian machines.
+
+The cost of swabbing is changing the code to use the le32_to_cpu
+etc. macros as used by ext2fs. We don't need to swab the compressed
+data, only the superblock, inodes and block pointers.
+
+
+The other part of making cramfs more sharable is choosing a block
+size. The options are:
+
+ 1. Always 4096 bytes.
+
+ 2. Writer chooses blocksize; kernel adapts but rejects blocksize >
+ PAGE_CACHE_SIZE.
+
+ 3. Writer chooses blocksize; kernel adapts even to blocksize >
+ PAGE_CACHE_SIZE.
+
+It's easy enough to change the kernel to use a smaller value than
+PAGE_CACHE_SIZE: just make cramfs_readpage read multiple blocks.
+
+The cost of option 1 is that kernels with a larger PAGE_CACHE_SIZE
+value don't get as good compression as they can.
+
+The cost of option 2 relative to option 1 is that the code uses
+variables instead of #define'd constants. The gain is that people
+with kernels having larger PAGE_CACHE_SIZE can make use of that if
+they don't mind their cramfs being inaccessible to kernels with
+smaller PAGE_CACHE_SIZE values.
+
+Option 3 is easy to implement if we don't mind being CPU-inefficient:
+e.g. get readpage to decompress to a buffer of size MAX_BLKSIZE (which
+must be no larger than 32KB) and discard what it doesn't need.
+Getting readpage to read into all the covered pages is harder.
+
+The main advantage of option 3 over 1, 2, is better compression. The
+cost is greater complexity. Probably not worth it, but I hope someone
+will disagree. (If it is implemented, then I'll re-use that code in
+e2compr.)
+
+
+Another cost of 2 and 3 over 1 is making mkcramfs use a different
+block size, but that just means adding and parsing a -b option.
+
+
+Inode Size
+----------
+
+Given that cramfs will probably be used for CDs etc. as well as just
+silicon ROMs, it might make sense to expand the inode a little from
+its current 12 bytes. Inodes other than the root inode are followed
+by filename, so the expansion doesn't even have to be a multiple of 4
+bytes.
diff -durn linux-2.3.39/fs/cramfs/cramfs.h linux-bld/fs/cramfs/cramfs.h
--- linux-2.3.39/fs/cramfs/cramfs.h Wed Nov 24 00:18:54 1999
+++ linux-bld/fs/cramfs/cramfs.h Tue Jan 11 10:39:29 2000
@@ -5,17 +5,20 @@
#define CRAMFS_SIGNATURE "Compressed ROMFS"
/*
- * Reasonably terse representation of the inode
- * data.. When the mode of the inode indicates
- * a special device node, the "offset" bits will
- * encode i_rdev. In other cases, "offset" points
- * to the ROM image for the actual file data
- * (whether that data be directory or compressed
- * file data depends on the inode type again)
+ * Reasonably terse representation of the inode data.
*/
struct cramfs_inode {
u32 mode:16, uid:16;
+ /* SIZE for device files is i_rdev */
u32 size:24, gid:8;
+ /* NAMELEN is the length of the file name, divided by 4 and
+ rounded up. (cramfs doesn't support hard links.) */
+ /* OFFSET: For symlinks and non-empty regular files, this
+ contains the offset (divided by 4) of the file data in
+ compressed form (starting with an array of block pointers;
+ see README). For non-empty directories it is the offset
+ (divided by 4) of the inode of the first file in that
+ directory. For anything else, offset is zero. */
u32 namelen:6, offset:26;
};
@@ -24,7 +27,8 @@
*/
struct cramfs_super {
u32 magic; /* 0x28cd3d45 - random number */
- u32 size; /* > offset, < 2**26 */
+ u32 size; /* Not used. mkcramfs currently
+ writes a constant 1<<16 here. */
u32 flags; /* 0 */
u32 future; /* 0 */
u8 signature[16]; /* "Compressed ROMFS" */
@@ -32,6 +36,13 @@
u8 name[16]; /* user-defined name */
struct cramfs_inode root; /* Root inode data */
};
+
+/*
+ * Valid values in super.flags. Currently we refuse to mount
+ * if (flags & ~CRAMFS_SUPPORTED_FLAGS). Maybe that should be
+ * changed to test super.future instead.
+ */
+#define CRAMFS_SUPPORTED_FLAGS (0xff)
/* Uncompression interfaces to the underlying zlib */
int cramfs_uncompress_block(void *dst, int dstlen, void *src, int srclen);
diff -durn linux-2.3.39/fs/cramfs/inode.c linux-bld/fs/cramfs/inode.c
--- linux-2.3.39/fs/cramfs/inode.c Sat Jan 8 23:55:52 2000
+++ linux-bld/fs/cramfs/inode.c Mon Jan 10 21:02:48 2000
@@ -27,7 +27,10 @@
static struct inode_operations cramfs_dir_inode_operations;
static struct inode_operations cramfs_symlink_inode_operations;
+/* These two macros may change in future, to provide better st_ino
+ semantics. */
#define CRAMINO(x) ((x)->offset?(x)->offset<<2:1)
+#define OFFSET(x) ((x)->i_ino)
static struct inode *get_cramfs_inode(struct super_block *sb, struct cramfs_inode * cramfs_inode)
{
@@ -41,6 +44,12 @@
inode->i_ino = CRAMINO(cramfs_inode);
inode->i_sb = sb;
inode->i_dev = sb->s_dev;
+ inode->i_nlink = 1; /* arguably wrong for directories,
+ but it's the best we can do
+ without reading the directory
+ contents. 1 yields the right
+ result in GNU find, even
+ without -noleaf option. */
insert_inode_hash(inode);
if (S_ISREG(inode->i_mode))
inode->i_op = &cramfs_file_inode_operations;
@@ -62,45 +71,75 @@
* up the accesses should be fairly regular and cached in the
* page cache and dentry tree anyway..
*
- * This also acts as a way to guarantee contiguous areas of
- * up to 2*PAGE_CACHE_SIZE, so that the caller doesn't need
- * to worry about end-of-buffer issues even when decompressing
- * a full page cache.
+ * This also acts as a way to guarantee contiguous areas of up to
+ * BLKS_PER_BUF*PAGE_CACHE_SIZE, so that the caller doesn't need to
+ * worry about end-of-buffer issues even when decompressing a full
+ * page cache.
*/
#define READ_BUFFERS (2)
-static unsigned char read_buffers[READ_BUFFERS][PAGE_CACHE_SIZE*4];
-static int buffer_blocknr[READ_BUFFERS];
-static int last_buffer = 0;
+/* NEXT_BUFFER(): Loop over [0..(READ_BUFFERS-1)]. */
+#define NEXT_BUFFER(_ix) ((_ix) ^ 1)
-static void *cramfs_read(struct super_block *sb, unsigned int offset)
+/*
+ * BLKS_PER_BUF_SHIFT must be at least 1 to allow for "compressed"
+ * data that takes up more space than the original. 1 is guaranteed
+ * to suffice, though. Larger values provide more read-ahead and
+ * proportionally less wastage at the end of the buffer.
+ */
+#define BLKS_PER_BUF_SHIFT (2)
+#define BLKS_PER_BUF (1 << BLKS_PER_BUF_SHIFT)
+static unsigned char read_buffers[READ_BUFFERS][BLKS_PER_BUF][PAGE_CACHE_SIZE];
+static unsigned buffer_blocknr[READ_BUFFERS];
+static int next_buffer = 0;
+
+/*
+ * Returns a pointer to a buffer containing at least LEN bytes of
+ * filesystem starting at byte offset OFFSET into the filesystem.
+ */
+static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len)
{
- struct buffer_head * bh_array[4];
- int i, blocknr, buffer;
+ struct buffer_head * bh_array[BLKS_PER_BUF];
+ unsigned i, blocknr, last_blocknr, buffer;
+ if (!len)
+ return NULL;
blocknr = offset >> PAGE_CACHE_SHIFT;
- offset &= PAGE_CACHE_SIZE-1;
+ last_blocknr = (offset + len - 1) >> PAGE_CACHE_SHIFT;
+ if (last_blocknr - blocknr >= BLKS_PER_BUF)
+ goto eek; resume:
+ offset &= PAGE_CACHE_SIZE - 1;
for (i = 0; i < READ_BUFFERS; i++) {
- if (blocknr == buffer_blocknr[i])
- return read_buffers[i] + offset;
+ if ((blocknr >= buffer_blocknr[i]) &&
+ (last_blocknr - buffer_blocknr[i] < BLKS_PER_BUF))
+ return &read_buffers[i][blocknr - buffer_blocknr[i]][offset];
}
- /* Ok, read in four buffers completely first */
- for (i = 0; i < 4; i++)
+ /* Ok, read in BLKS_PER_BUF pages completely first. */
+ for (i = 0; i < BLKS_PER_BUF; i++)
bh_array[i] = bread(sb->s_dev, blocknr + i, PAGE_CACHE_SIZE);
- /* Ok, copy them to the staging area without sleeping.. */
- buffer = last_buffer;
- last_buffer = buffer ^ 1;
+ /* Ok, copy them to the staging area without sleeping. */
+ buffer = next_buffer;
+ next_buffer = NEXT_BUFFER(buffer);
buffer_blocknr[buffer] = blocknr;
- for (i = 0; i < 4; i++) {
+ for (i = 0; i < BLKS_PER_BUF; i++) {
struct buffer_head * bh = bh_array[i];
if (bh) {
- memcpy(read_buffers[buffer] + i*PAGE_CACHE_SIZE, bh->b_data, PAGE_CACHE_SIZE);
+ memcpy(read_buffers[buffer][i], bh->b_data, PAGE_CACHE_SIZE);
bforget(bh);
- }
- blocknr++;
+ } else
+ memset(read_buffers[buffer][i], 0, PAGE_CACHE_SIZE);
}
- return read_buffers[buffer] + offset;
+ return read_buffers[buffer][0] + offset;
+
+ eek:
+ printk(KERN_ERR
+ "cramfs (device %s): requested chunk (%u:+%u) bigger than buffer\n",
+ bdevname(sb->s_dev), offset, len);
+ /* TODO: return EIO to process or kill the current process
+ instead of resuming. */
+ *((int *)0) = 0; /* XXX: doesn't work on all archs */
+ goto resume;
}
@@ -121,7 +160,7 @@
buffer_blocknr[i] = -1;
/* Read the first block and get the superblock from it */
- memcpy(&super, cramfs_read(sb, 0), sizeof(super));
+ memcpy(&super, cramfs_read(sb, 0, sizeof(super)), sizeof(super));
/* Do sanity checks on the superblock */
if (super.magic != CRAMFS_MAGIC) {
@@ -132,21 +171,23 @@
printk("wrong signature\n");
goto out;
}
-
- /* Check that the root inode is in a sane state */
- root_offset = super.root.offset << 2;
- if (root_offset < sizeof(struct cramfs_super)) {
- printk("root offset too small\n");
- goto out;
- }
- if (root_offset >= super.size) {
- printk("root offset too large (%lu %u)\n", root_offset, super.size);
+ if (super.flags & ~CRAMFS_SUPPORTED_FLAGS) {
+ printk("unsupported filesystem features\n");
goto out;
}
+
+ /* Check that the root inode is in a sane state */
if (!S_ISDIR(super.root.mode)) {
printk("root is not a directory\n");
goto out;
}
+ root_offset = super.root.offset << 2;
+ if (root_offset == 0)
+ printk(KERN_INFO "cramfs: note: empty filesystem");
+ else if (root_offset != sizeof(struct cramfs_super)) {
+ printk("bad root offset %lu\n", root_offset);
+ goto out;
+ }
/* Set it all up.. */
sb->s_op = &cramfs_ops;
@@ -168,19 +209,20 @@
{
struct statfs tmp;
- memset(&tmp, 0, sizeof(tmp));
+ /* Unsupported fields set to -1 as per man page. */
+ memset(&tmp, 0xff, sizeof(tmp));
+
tmp.f_type = CRAMFS_MAGIC;
tmp.f_bsize = PAGE_CACHE_SIZE;
- tmp.f_blocks = 0;
+ tmp.f_bfree = 0;
+ tmp.f_bavail = 0;
+ tmp.f_ffree = 0;
tmp.f_namelen = 255;
return copy_to_user(buf, &tmp, bufsize) ? -EFAULT : 0;
}
/*
- * Read a cramfs directory entry..
- *
- * Remember: the inode number is the byte offset of the start
- * of the directory..
+ * Read a cramfs directory entry.
*/
static int cramfs_readdir(struct file *filp, void *dirent, filldir_t filldir)
{
@@ -189,7 +231,7 @@
unsigned int offset;
int copied;
- /* Offset within the thing.. */
+ /* Offset within the thing. */
offset = filp->f_pos;
if (offset >= inode->i_size)
return 0;
@@ -204,7 +246,7 @@
char *name;
int namelen, error;
- de = cramfs_read(sb, offset + inode->i_ino);
+ de = cramfs_read(sb, OFFSET(inode) + offset, sizeof(*de)+256);
name = (char *)(de+1);
/*
@@ -244,7 +286,7 @@
char *name;
int namelen;
- de = cramfs_read(dir->i_sb, offset + dir->i_ino);
+ de = cramfs_read(dir->i_sb, OFFSET(dir) + offset, sizeof(*de)+256);
name = (char *)(de+1);
namelen = de->namelen << 2;
offset += sizeof(*de) + namelen;
@@ -274,27 +316,29 @@
static int cramfs_readpage(struct dentry *dentry, struct page * page)
{
struct inode *inode = dentry->d_inode;
- unsigned long maxblock, bytes;
+ u32 maxblock, bytes_filled;
maxblock = (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
- bytes = 0;
+ bytes_filled = 0;
if (page->index < maxblock) {
struct super_block *sb = inode->i_sb;
- unsigned long block_offset = inode->i_ino + page->index*4;
- unsigned long start_offset = inode->i_ino + maxblock*4;
- unsigned long end_offset;
+ u32 blkptr_offset = OFFSET(inode) + page->index*4;
+ u32 start_offset, compr_len;
- end_offset = *(u32 *) cramfs_read(sb, block_offset);
+ start_offset = OFFSET(inode) + maxblock*4;
if (page->index)
- start_offset = *(u32 *) cramfs_read(sb, block_offset-4);
-
- bytes = inode->i_size & (PAGE_CACHE_SIZE - 1);
- if (page->index < maxblock)
- bytes = PAGE_CACHE_SIZE;
-
- cramfs_uncompress_block((void *) page_address(page), PAGE_CACHE_SIZE, cramfs_read(sb, start_offset), end_offset - start_offset);
+ start_offset = *(u32 *) cramfs_read(sb, blkptr_offset-4, 4);
+ compr_len = (*(u32 *) cramfs_read(sb, blkptr_offset, 4)
+ - start_offset);
+ if (compr_len == 0)
+ ; /* hole */
+ else
+ bytes_filled = cramfs_uncompress_block((void *) page_address(page),
+ PAGE_CACHE_SIZE,
+ cramfs_read(sb, start_offset, compr_len),
+ compr_len);
}
- memset((void *) (page_address(page) + bytes), 0, PAGE_CACHE_SIZE - bytes);
+ memset((void *) (page_address(page) + bytes_filled), 0, PAGE_CACHE_SIZE - bytes_filled);
SetPageUptodate(page);
UnlockPage(page);
return 0;
diff -durn linux-2.3.39/fs/cramfs/uncompress.c linux-bld/fs/cramfs/uncompress.c
--- linux-2.3.39/fs/cramfs/uncompress.c Tue Nov 23 22:43:51 1999
+++ linux-bld/fs/cramfs/uncompress.c Sun Jan 9 10:19:11 2000
@@ -22,6 +22,7 @@
static z_stream stream;
static int initialized = 0;
+/* Returns length of decompressed data. */
int cramfs_uncompress_block(void *dst, int dstlen, void *src, int srclen)
{
int err;
@@ -32,14 +33,22 @@
stream.next_out = dst;
stream.avail_out = dstlen;
- inflateReset(&stream);
+ err = inflateReset(&stream);
+ if (err != Z_OK) {
+ printk("inflateReset error %d\n", err);
+ inflateEnd(&stream);
+ inflateInit(&stream);
+ }
err = inflate(&stream, Z_FINISH);
- if (err != Z_STREAM_END) {
- printk("Error %d while decompressing!\n", err);
- printk("%p(%d)->%p(%d)\n", src, srclen, dst, dstlen);
- }
+ if (err != Z_STREAM_END)
+ goto err;
return stream.total_out;
+
+err:
+ printk("Error %d while decompressing!\n", err);
+ printk("%p(%d)->%p(%d)\n", src, srclen, dst, dstlen);
+ return 0;
}
int cramfs_uncompress_init(void)
diff -durn linux-2.3.39/scripts/cramfs/GNUmakefile linux-bld/scripts/cramfs/GNUmakefile
--- linux-2.3.39/scripts/cramfs/GNUmakefile Thu Jan 1 00:00:00 1970
+++ linux-bld/scripts/cramfs/GNUmakefile Sun Jan 9 02:24:37 2000
@@ -0,0 +1,11 @@
+CFLAGS = -Wall -O2
+CPPFLAGS = -I../../fs/cramfs
+LDLIBS = -lz
+PROGS = mkcramfs
+
+all: $(PROGS)
+
+distclean clean:
+ rm -f $(PROGS)
+
+.PHONY: all clean
diff -durn linux-2.3.39/scripts/cramfs/mkcramfs.c linux-bld/scripts/cramfs/mkcramfs.c
--- linux-2.3.39/scripts/cramfs/mkcramfs.c Wed Nov 24 01:53:38 1999
+++ linux-bld/scripts/cramfs/mkcramfs.c Sun Jan 9 12:06:28 2000
@@ -6,7 +6,9 @@
#include <sys/fcntl.h>
#include <dirent.h>
#include <stdlib.h>
+#include <errno.h>
#include <string.h>
+#include <assert.h>
/* zlib required.. */
#include <zlib.h>
@@ -17,11 +19,11 @@
#include "cramfs.h"
-#define PAGE_CACHE_SIZE (4096)
-
static const char *progname = "mkcramfs";
-void usage(void)
+/* N.B. If you change the disk format of cramfs, please update fs/cramfs/README. */
+
+static void usage(void)
{
fprintf(stderr, "Usage: '%s dirname outfile'\n"
" where <dirname> is the root of the\n"
@@ -29,6 +31,27 @@
exit(1);
}
+/*
+ * If DO_HOLES is defined, then mkcramfs can create explicit holes in the
+ * data, which saves 26 bytes per hole (which is a lot smaller a saving than
+ * most filesystems).
+ *
+ * Note that kernels up to at least 2.3.39 don't support cramfs holes, which
+ * is why this defaults to undefined at the moment.
+ */
+/* #define DO_HOLES 1 */
+
+#define PAGE_CACHE_SIZE (4096)
+/* The kernel assumes PAGE_CACHE_SIZE as block size. */
+static unsigned int blksize = PAGE_CACHE_SIZE;
+
+static int warn_dev, warn_gid, warn_link, warn_namelen, warn_size, warn_uid;
+
+#ifndef MIN
+# define MIN(_a,_b) ((_a) < (_b) ? (_a) : (_b))
+#endif
+
+/* In-core version of inode / directory entry. */
struct entry {
/* stats */
char *name;
@@ -37,34 +60,51 @@
/* FS data */
void *uncompressed;
unsigned int dir_offset; /* Where in the archive is the directory entry? */
- unsigned int data_offset; /* Where in the archive is the start of the data? */
/* organization */
- struct entry *child;
+ struct entry *child; /* null for non-directories and empty directories */
struct entry *next;
};
/*
- * We should mind about memory leaks and
- * checking for out-of-memory.
- *
- * We don't.
+ * Width of various bitfields in struct cramfs_inode.
+ * Used only to generate warnings.
*/
-static unsigned int parse_directory(const char *name, struct entry **prev)
+#define SIZE_WIDTH 24
+#define UID_WIDTH 16
+#define GID_WIDTH 8
+#define OFFSET_WIDTH 26
+
+/*
+ * The longest file name component to allow for in the input directory tree.
+ * Ext2fs (and many others) allow up to 255 bytes. A couple of filesystems
+ * allow longer (e.g. smbfs 1024), but there isn't much use in supporting
+ * >255-byte names in the input directory tree given that such names get
+ * truncated to 255 bytes when written to cramfs.
+ */
+#define MAX_INPUT_NAMELEN 255
+
+static unsigned int parse_directory(const char *name, struct entry **prev, loff_t *fslen_ub)
{
DIR *dir;
int count = 0, totalsize = 0;
struct dirent *dirent;
char *path, *endpath;
- int len = strlen(name);
+ size_t len = strlen(name);
dir = opendir(name);
if (!dir) {
perror(name);
exit(2);
}
- /* Set up the path.. */
- path = malloc(4096);
+
+ /* Set up the path. */
+ /* TODO: Reuse the parent's buffer to save memcpy'ing and duplication. */
+ path = malloc(len + 1 + MAX_INPUT_NAMELEN + 1);
+ if (!path) {
+ perror(NULL);
+ exit(1);
+ }
memcpy(path, name, len);
endpath = path + len;
*endpath = '/';
@@ -73,7 +113,8 @@
while ((dirent = readdir(dir)) != NULL) {
struct entry *entry;
struct stat st;
- int fd, size;
+ int size;
+ size_t namelen;
/* Ignore "." and ".." - we won't be adding them to the archive */
if (dirent->d_name[0] == '.') {
@@ -84,44 +125,119 @@
continue;
}
}
- strcpy(endpath, dirent->d_name);
+ namelen = strlen(dirent->d_name);
+ if (namelen > MAX_INPUT_NAMELEN) {
+ fprintf(stderr,
+ "Very long (%u bytes) filename `%s' found.\n"
+ " Please increase MAX_INPUT_NAMELEN in mkcramfs.c and recompile. Exiting.\n",
+ namelen, dirent->d_name);
+ exit(1);
+ }
+ memcpy(endpath, dirent->d_name, namelen + 1);
if (lstat(path, &st) < 0) {
perror(endpath);
continue;
}
entry = calloc(1, sizeof(struct entry));
+ if (!entry) {
+ perror(NULL);
+ exit(5);
+ }
entry->name = strdup(dirent->d_name);
+ if (!entry->name) {
+ perror(NULL);
+ exit(1);
+ }
+ if (namelen > 255) {
+ /* Can't happen when reading from ext2fs. */
+
+ /* TODO: we ought to avoid chopping in half
+ multi-byte UTF8 characters. */
+ entry->name[namelen = 255] = '\0';
+ warn_namelen = 1;
+ }
entry->mode = st.st_mode;
entry->size = st.st_size;
entry->uid = st.st_uid;
+ if (entry->uid >= 1 << UID_WIDTH)
+ warn_uid = 1;
entry->gid = st.st_gid;
- size = sizeof(struct cramfs_inode) + (~3 & (strlen(entry->name) + 3));
+ if (entry->gid >= 1 << GID_WIDTH)
+ /* TODO: We ought to replace with a default
+ gid instead of truncating; otherwise there
+ are security problems. Maybe mode should
+ be &= ~070. Same goes for uid once Linux
+ supports >16-bit uids. */
+ warn_gid = 1;
+ size = sizeof(struct cramfs_inode) + ((namelen + 3) & ~3);
+ *fslen_ub += size;
if (S_ISDIR(st.st_mode)) {
- entry->size = parse_directory(path, &entry->child);
+ entry->size = parse_directory(path, &entry->child, fslen_ub);
} else if (S_ISREG(st.st_mode)) {
+ /* TODO: We ought to open files in do_compress, one
+ at a time, instead of amassing all these memory
+ maps during parse_directory (which don't get used
+ until do_compress anyway). As it is, we tend to
+ get EMFILE errors (especially if mkcramfs is run
+ by non-root).
+
+ While we're at it, do analagously for symlinks
+ (which would just save a little memory). */
int fd = open(path, O_RDONLY);
if (fd < 0) {
perror(path);
continue;
}
- if (entry->size)
- entry->uncompressed = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
- if (-1 == (int) (long) entry->uncompressed) {
- perror("mmap");
- exit(5);
+ if (entry->size) {
+ if ((entry->size >= 1 << SIZE_WIDTH)) {
+ warn_size = 1;
+ entry->size = (1 << SIZE_WIDTH) - 1;
+ }
+
+ entry->uncompressed = mmap(NULL, entry->size, PROT_READ, MAP_PRIVATE, fd, 0);
+ if (-1 == (int) (long) entry->uncompressed) {
+ perror("mmap");
+ exit(5);
+ }
+ if (st.st_nlink > 1) {
+ /* TODO: Although cramfs doesn't
+ support hard links, we could still
+ share data offset values between
+ different inodes (safe because
+ read-only). This would give at
+ least the space saving of hard
+ links. Just keep a hash mapping
+ <st_ino, st_dev> onto struct
+ entry*. Alternatively, steal some
+ code from Roger Wolff's `same'
+ program, which creates a hash of
+ file data contents. */
+ warn_link = 1;
+ }
}
close(fd);
} else if (S_ISLNK(st.st_mode)) {
- entry->uncompressed = malloc(st.st_size);
- if (readlink(path, entry->uncompressed, st.st_size) < 0) {
+ entry->uncompressed = malloc(entry->size);
+ if (!entry->uncompressed) {
+ perror(NULL);
+ exit(5);
+ }
+ if (readlink(path, entry->uncompressed, entry->size) < 0) {
perror(path);
continue;
}
} else {
entry->size = st.st_rdev;
+ if (entry->size & -(1<<SIZE_WIDTH))
+ warn_dev = 1;
}
+ if (S_ISREG(st.st_mode) || S_ISLNK(st.st_mode))
+ /* block pointers & data expansion allowance + data */
+ *fslen_ub += ((4+26)*((entry->size - 1) / blksize + 1)
+ + MIN(entry->size + 3, st.st_blocks << 9));
+
/* Link it into the list */
*prev = entry;
prev = &entry->next;
@@ -133,7 +249,7 @@
return totalsize;
}
-static void set_random(void *area, int size)
+static void set_random(void *area, size_t size)
{
int fd = open("/dev/random", O_RDONLY);
@@ -144,6 +260,7 @@
memset(area, 0x00, size);
}
+/* Returns sizeof(struct cramfs_super), which includes the root inode. */
static unsigned int write_superblock(struct entry *root, char *base)
{
struct cramfs_super *super = (struct cramfs_super *) base;
@@ -151,6 +268,8 @@
super->magic = CRAMFS_MAGIC;
super->flags = 0;
+ /* Note: 0x10000 is meaningless, which is a bug; but
+ super->size is never used anyway. */
super->size = 0x10000;
memcpy(super->signature, CRAMFS_SIGNATURE, sizeof(super->signature));
set_random(super->fsid, sizeof(super->fsid));
@@ -168,6 +287,11 @@
static void set_data_offset(struct entry *entry, char *base, unsigned long offset)
{
struct cramfs_inode *inode = (struct cramfs_inode *) (base + entry->dir_offset);
+ assert ((offset & 3) == 0);
+ if (offset >= (1 << (2 + OFFSET_WIDTH))) {
+ fprintf(stderr, "filesystem too big. Exiting.\n");
+ exit(1);
+ }
inode->offset = (offset >> 2);
}
@@ -178,25 +302,28 @@
* we've seen.
*/
#define MAXENTRIES (100)
-static int stack_entries = 0;
-static struct entry *entry_stack[MAXENTRIES];
-
static unsigned int write_directory_structure(struct entry *entry, char *base, unsigned int offset)
{
+ int stack_entries = 0;
+ struct entry *entry_stack[MAXENTRIES];
+
for (;;) {
+ int dir_start = stack_entries;
while (entry) {
struct cramfs_inode *inode = (struct cramfs_inode *) (base + offset);
- int len = strlen(entry->name);
+ size_t len = strlen(entry->name);
entry->dir_offset = offset;
- offset += sizeof(struct cramfs_inode);
inode->mode = entry->mode;
inode->uid = entry->uid;
inode->gid = entry->gid;
inode->size = entry->size;
- inode->offset = 0; /* Fill in later */
+ inode->offset = 0;
+ /* Non-empty directories, regfiles and symlinks will
+ write over inode->offset later. */
+ offset += sizeof(struct cramfs_inode);
memcpy(base + offset, entry->name, len);
/* Pad up the name to a 4-byte boundary */
while (len & 3) {
@@ -206,14 +333,41 @@
inode->namelen = len >> 2;
offset += len;
+ /* TODO: this may get it wrong for chars >= 0x80.
+ Most filesystems use UTF8 encoding for filenames,
+ whereas the console is a single-byte character
+ set like iso-latin-1. */
printf(" %s\n", entry->name);
-
if (entry->child) {
+ if (stack_entries >= MAXENTRIES) {
+ fprintf(stderr, "Exceeded MAXENTRIES. Raise this value in mkcramfs.c and recompile. Exiting.\n");
+ exit(1);
+ }
entry_stack[stack_entries] = entry;
stack_entries++;
}
entry = entry->next;
}
+
+ /*
+ * Reverse the order the stack entries pushed during
+ * this directory, for a small optimization of disk
+ * access in the created fs. This change makes things
+ * `ls -UR' order.
+ */
+ {
+ struct entry **lo = entry_stack + dir_start;
+ struct entry **hi = entry_stack + stack_entries;
+ struct entry *tmp;
+
+ while (lo < --hi) {
+ tmp = *lo;
+ *lo++ = *hi;
+ *hi = tmp;
+ }
+ }
+
+ /* Pop a subdirectory entry from the stack, and recurse. */
if (!stack_entries)
break;
stack_entries--;
@@ -226,35 +380,61 @@
return offset;
}
+#ifdef DO_HOLES
+/*
+ * Returns non-zero iff the first LEN bytes from BEGIN are all NULs.
+ */
+static int
+is_zero(char const *begin, unsigned len)
+{
+ return (len-- == 0 ||
+ (begin[0] == '\0' &&
+ (len-- == 0 ||
+ (begin[1] == '\0' &&
+ (len-- == 0 ||
+ (begin[2] == '\0' &&
+ (len-- == 0 ||
+ (begin[3] == '\0' &&
+ memcmp(begin, begin + 4, len) == 0))))))));
+}
+#else /* !DO_HOLES */
+# define is_zero(_begin,_len) (0) /* Never create holes. */
+#endif /* !DO_HOLES */
+
/*
* One 4-byte pointer per block and then the actual blocked
* output. The first block does not need an offset pointer,
- * as it will start immediately after the pointer block.
+ * as it will start immediately after the pointer block;
+ * so the i'th pointer points to the end of the i'th block
+ * (i.e. the start of the (i+1)'th block or past EOF).
*
* Note that size > 0, as a zero-sized file wouldn't ever
* have gotten here in the first place.
*/
-static unsigned int do_compress(char *base, unsigned int offset, char *uncompressed, unsigned int size)
+static unsigned int do_compress(char *base, unsigned int offset, char const *name, char *uncompressed, unsigned int size)
{
unsigned long original_size = size;
unsigned long original_offset = offset;
unsigned long new_size;
- unsigned long blocks = (size - 1) / PAGE_CACHE_SIZE + 1;
+ unsigned long blocks = (size - 1) / blksize + 1;
unsigned long curr = offset + 4 * blocks;
int change;
do {
+ unsigned long len = 2 * blksize;
unsigned int input = size;
- unsigned long len = 8192;
- if (input > PAGE_CACHE_SIZE)
- input = PAGE_CACHE_SIZE;
- compress(base + curr, &len, uncompressed, input);
- uncompressed += input;
+ if (input > blksize)
+ input = blksize;
size -= input;
- curr += len;
+ if (!is_zero (uncompressed, input)) {
+ compress(base + curr, &len, uncompressed, input);
+ uncompressed += input;
+ curr += len;
+ }
- if (len > PAGE_CACHE_SIZE*2) {
- printf("AIEEE: block expanded to > 2*blocklength (%d)\n", len);
+ if (len > blksize*2) {
+ /* (I don't think this can happen with zlib.) */
+ printf("AIEEE: block \"compressed\" to > 2*blocklength (%ld)\n", len);
exit(1);
}
@@ -262,67 +442,120 @@
offset += 4;
} while (size);
+ curr = (curr + 3) & ~3;
new_size = curr - original_offset;
+ /* TODO: Arguably, original_size in these 2 lines should be
+ st_blocks * 512. But if you say that then perhaps
+ administrative data should also be included in both. */
change = new_size - original_size;
- printf("%4.2f %% (%d bytes)\n", (change * 100) / (double) original_size, change);
+ printf("%5.2f%% (%d bytes)\t%s\n",
+ (change * 100) / (double) original_size, change, name);
- return (curr + 3) & ~3;
+ return curr;
}
+
+/*
+ * Traverse the entry tree, writing data for every item that has
+ * non-null entry->compressed (i.e. every symlink and non-empty
+ * regfile).
+ *
+ * Frees the entry pointers as it goes.
+ */
static unsigned int write_data(struct entry *entry, char *base, unsigned int offset)
{
do {
if (entry->uncompressed) {
set_data_offset(entry, base, offset);
- offset = do_compress(base, offset, entry->uncompressed, entry->size);
+ offset = do_compress(base, offset, entry->name, entry->uncompressed, entry->size);
}
- if (entry->child)
+ else if (entry->child)
offset = write_data(entry->child, base, offset);
- entry = entry->next;
+
+ /* Free the old before processing the next. */
+ {
+ struct entry *tmp = entry;
+ entry = entry->next;
+ free(tmp->name);
+ free(tmp);
+ }
} while (entry);
return offset;
}
-/* This is the maximum rom-image you can create */
-#define MAXROM (64*1024*1024)
+
+/*
+ * Maximum size fs you can create is roughly 256MB. (The last file's
+ * data must begin within 256MB boundary but can extend beyond that.)
+ *
+ * Note that if you want it to fit in a ROM then you're limited to what the
+ * hardware and kernel can support (64MB?).
+ */
+#define MAXFSLEN ((((1 << OFFSET_WIDTH) - 1) << 2) /* offset */ \
+ + (1 << SIZE_WIDTH) - 1 /* filesize */ \
+ + (1 << SIZE_WIDTH) * 4 / PAGE_CACHE_SIZE /* block pointers */ )
+
/*
* Usage:
*
- * mkcramfs directory-name
+ * mkcramfs directory-name outfile
*
* where "directory-name" is simply the root of the directory
* tree that we want to generate a compressed filesystem out
- * of..
+ * of.
*/
int main(int argc, char **argv)
{
struct stat st;
struct entry *root_entry;
char *rom_image;
- unsigned int offset, written;
+ unsigned int offset;
+ ssize_t written;
int fd;
+ loff_t fslen_ub = 0; /* initial guess (upper-bound) of
+ required filesystem size */
+ char const *dirname;
if (argc)
progname = argv[0];
if (argc != 3)
usage();
- if (stat(argv[1], &st) < 0) {
+ if (stat(dirname = argv[1], &st) < 0) {
perror(argv[1]);
exit(1);
}
fd = open(argv[2], O_WRONLY | O_CREAT | O_TRUNC, 0666);
root_entry = calloc(1, sizeof(struct entry));
+ if (!root_entry) {
+ perror(NULL);
+ exit(5);
+ }
root_entry->mode = st.st_mode;
root_entry->uid = st.st_uid;
root_entry->gid = st.st_gid;
- root_entry->name = "";
- root_entry->size = parse_directory(argv[1], &root_entry->child);
+ root_entry->size = parse_directory(argv[1], &root_entry->child, &fslen_ub);
+ if (fslen_ub > MAXFSLEN) {
+ fprintf(stderr,
+ "warning: guestimate of required size (upper bound) is %luMB, but maximum image size is %uMB. We might die prematurely.\n",
+ (unsigned long) (fslen_ub >> 20),
+ MAXFSLEN >> 20);
+ fslen_ub = MAXFSLEN;
+ }
- rom_image = mmap(NULL, MAXROM, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ /* TODO: Why do we use a private/anonymous mapping here
+ followed by a write below, instead of just a shared mapping
+ and a couple of ftruncate calls? Is it just to save us
+ having to deal with removing the file afterwards? If we
+ really need this huge anonymous mapping, we ought to mmap
+ in smaller chunks, so that the user doesn't need nn MB of
+ RAM free. If the reason is to be able to write to
+ un-mmappable block devices, then we could try shared mmap
+ and revert to anonymous mmap if the shared mmap fails. */
+ rom_image = mmap(NULL, fslen_ub, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (-1 == (int) (long) rom_image) {
perror("ROM image map");
exit(1);
@@ -334,7 +567,11 @@
printf("Directory data: %d bytes\n", offset);
offset = write_data(root_entry, rom_image, offset);
- printf("Everything: %d bytes\n", offset);
+
+ /* We always write a multiple of blksize bytes, so that
+ losetup works. */
+ offset = ((offset - 1) | (blksize - 1)) + 1;
+ printf("Everything: %d kilobytes\n", offset >> 10);
written = write(fd, rom_image, offset);
if (written < 0) {
@@ -345,5 +582,33 @@
fprintf(stderr, "ROM image write failed (%d %d)\n", written, offset);
exit(1);
}
+
+ /* (These warnings used to come at the start, but they scroll off the
+ screen too quickly.) */
+ if (warn_namelen) /* (can't happen when reading from ext2fs) */
+ fprintf(stderr, /* bytes, not chars: think UTF8. */
+ "warning: filenames truncated to 255 bytes.\n");
+ if (warn_link)
+ fprintf(stderr,
+ "warning: cramfs cannot represent hard links. You may want to change hard links in\n"
+ " %s with symlinks to other files in %s.\n",
+ dirname, dirname);
+ if (warn_size)
+ fprintf(stderr,
+ "warning: file sizes truncated to %luMB (minus 1 byte).\n",
+ 1L << (SIZE_WIDTH - 20));
+ if (warn_uid) /* (not possible with current Linux versions) */
+ fprintf(stderr,
+ "warning: uids truncated to %u bits. (This may be a security concern.)\n",
+ UID_WIDTH);
+ if (warn_gid)
+ fprintf(stderr,
+ "warning: gids truncated to %u bits. (This may be a security concern.)\n",
+ GID_WIDTH);
+ if (warn_dev)
+ fprintf(stderr,
+ "WARNING: device numbers truncated to %u bits. This almost certainly means\n"
+ "that some device files will be wrong.\n",
+ OFFSET_WIDTH);
return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sat Jan 15 2000 - 21:00:17 EST