Re: Using compression before encryption in device-mapper

From: Guillaume Lacôte
Date: Thu Apr 22 2004 - 03:04:18 EST


Thank you for your feed-back and sorry for my late answer (just back from
holidays).
> Btw, looks like the whole idea is broken. Consider a block of
> known-plaintext. The known plaintext happens to be at the beginning
> of the block and has more entropy than the blocksize of your block
> cypher. Bang, you're dead.
I am sorry but I failed to understand this problem; could you elaborate on it
? Are you saying that if I have known plaintext that compreses to at least on
full block, the problem remains ?

- Now (with dm-crypt) = basically the first 512 bytes are known (apart from
iv, see discussion in dm-crypt threads). This termendously helps a
brute-force attack (use a cluster to pre-calculate all possible encryptions
of these 512 bytes and you are done).

- What I suggest = 1) grow entropy (this does not solve the above problem)
2) interleave random bytes _before_ each (plain) block of data. Bytes are
drawn as to make the distribution on Huffman encodings uniform.
Thus even if you know the plain text, even if it would compress to more than
4kB, since the block starts with random bytes you know nothing about, and
since these random bytes have changed the Huffman encoding used to encode the
known plain text, I claim that you know about nothing (?).

Other way to say the same: start by chosing one Huffman encoding among all
possible 8bit encodings; "write" this encoding, and then write your
(eventually known) data with this encoding. To "write" the encoding, just
draw one particular random sequence whose Huffman encoding is the one you
want (this avoids having a "dictionnary" and any other meta-data in the
encrypted data).

>
> Your whole approach depends on the fact, that any known plaintext in
> the device is either not at the beginning of the block or has very
> little entropy. 1k of zeroes already has too much entropy if you use
> any form of huffman, so without RLE you're not frequently dead, but
> practically always.
You are perfectly right that Huffman is a poor redundancy fighter and will not
always drive the size of data down to its Kolmogorov complexity. The "1k of
zeroes" might still be a problem. I agree that a better compression algorithm
would be nice, but the problem is that i _need_ to know how draw random bytes
so as to make the _compressed_ encoding uniformly distributed.
>
> Does the idea still sound sane to you?
I hope so; what is your opinion ?
Guillaume.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/