Re: [patch] large swap areas

Sylvain Pion (Sylvain.Pion@sophia.inria.fr)
Tue, 24 Feb 1998 17:29:25 +0100


Jakub said:
> While you are at it, it would be good to handle disklabels inside of the
> partition as well. At the moment it is not possible to have swap as first
> partition on the disk on Sparc/UltraLinux boxes. First 512B of the disk
> needs to be partition table (disklabel), next 512B have boot loader (in case
> of SILO, Solaris boot loader takes 8K-512B).

I'm not sure that it's a good solution. What happens if in the future another
architecture has the same kind of problem ? What about changing the beginning
of your swap partition (with fdisk) to avoid the disk label area ?
How does the ext2 partition format deal with this problem ?

> So I'd suggest a NEW-SWAP-SPACE format, which would:
>
> a) leave first 8K undefined
> b) then at 8K it would contain the new signature, e.g.:
> \x5e\xa2NEWSWP
> then u64 size of the swap space
> then u32 PAGE_SIZE for which it was made
> then u32 number of lockmap pages including this one
> if 0, then all the swap space but first 8K+PAGE_SIZE is usable
> (ie. lockmap would contain on PAGE_SIZE 4K bits 0 0 0 1 1 1 ... 1 1 1
> on PAGE_SIZE 8K 0 0 1 1 1 ... 1 1), if non-zero, following
> this number will be the lockmap
> c) so that we maintain backwards compatibility with older swaps on non-sun
> boxes (Solaris x86 uses similar disk labels), mkswap would check, if
> the disk contains valid Sun disklabel (Big Endian 0xDABE???? as last
> u32 in first 512K of the disk). If yes, it would not touch first 8K of
> the disk at all and just write the new signature; if not, it would
> write both old and new signature page, where in the old at least first
> 3 (for 4K pages) or 2 (for 8K pages) bits were cleared.
> d) old kernel would keep working as it used to, new kernel would
> first try to read old signature page. If it sees SWAP-SPACE at the end,
> it would check first 3 (2) bits. If they were cleared or if SWAP-SPACE
> signature is not present, it would try to read the new signature page...
>
> What do you think about that? If you'd like to code it, I'd be happy to test
> it for you, if not, I can write it if I find a spare time for it...
> Any comments?

I can do something, but first let's agree on a good format before doing
anything else. The format I chose was simple, and allowed to swapon a
new swap-space with an older kernel (with only 128MB recognized of course).

But now I think that we can break the swap-space format, since compatibility
can be done in user space (if you want new and old kernels use a common
swap area, just modify it at boot time, before swapon'ing it).

I also received a mail from Daniel Quinlan <quinlan@transmeta.com>:

> I saw Sylvain posted a patch to implement large swap pages (similar to
> something I've been dabbling with for the few weeks in my copious free
> time). Anyway, I spent a little time discussing how to do a large swap
> file with Linus, and came up with a different file format for the swap
> file that I'd like you to consider.
>
> The main difference is that instead of a bad blocks bitmap, a bad blocks
> list is used, plus a different magic header, a version field for future
> changes, a length field, and oodles of room for future information. The
> header also works well with file(1). If a file has enough bad blocks
> that it can't fit the listing in the first page, it just shouldn't be
> used.
>
> The kernel changes are pretty trivial to support either Sylvain's
> extended format or this one.

Here's an extract of the corresponding patch for mkswap.c:

#define MAGIC_STRING "SWAPSPACE2"

struct swap_header {
char magic[10];
unsigned int version;
unsigned int last_page;
char reserved[1014 - 3 * sizeof(int)];
};

struct swap_first_page {
struct swap_header header;
unsigned int bad_blocks[PAGE_SIZE / sizeof(int) - sizeof(struct
swap_header)];
};

static struct swap_first_page first_page;

This seems reasonnable to me. What do people think about it ?

> BTW: If swap_lockmap and/or swap_map is small enough, it would probably be
> better to try to allocate it with __get_free_pages(,order) first, as on some
> architectures vmalloc results in slower access than __get_free_pages()...

This can be done, but compared with swapping time, I doubt it will make a big
difference, right ? However, my patch is now obscolete, since swap_lockmap
is removed in pre-2.1.89-1.

-- 
Sylvain

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu