Re: Using compression before encryption in device-mapper

From: Guillaume Lacôte
Date: Thu Apr 22 2004 - 05:21:52 EST


Thank you for your prompt answer ;)

> Before:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=107419912024246&w=2
Yes, I had seen this.

> After:

> And of course it has to be at the beginning of a compression block, so
> the offset is known in advance.
OK, but this is not the case anymore if you insert random bytes first (?).
Where does the encrypted non-random data start ? This depends on the huffman
encoding, which you know nothing about.

>
> Ok, that makes the attack a little harder (but not much). After any
> amount of random data, you end up with a random huffman tree, agreed.
> Compress another 1k of zeros and see what the huffman tree looks like
> now. There are not too many options, what the compressed data could
> look like, right? Somewhere in the ballpark of 2, 4 or maybe 8.
OK, you are right. So assuming you have:
1) n randomly choosen bytes (n is itselft random)
Here the dynamic Huffman tree is T1 (uniformly distributed among all
trees)
2) p zero bytes
Now the tree is T2.
3) some more valuable data you would not want the attacker to know about.
My whole idea is useless if the attacker can gain sufficient knowledge on T2.
So if p >> n, then T2 will look like the well-known tree with 0 stored on 1
bit. I had missed this problem, thank you.

However (?):
a) the attacker stills knows nothing about the rest of the tree (ie what the
encoding of all other bits are). Basically to construct T2 you take T1, let
"0" be the coding of "0", and prefix the code of all other bytes with a "1".

On the other hand, replace (2) with a known sequence of p _equally
distributed_ bytes: now T2 looks like the well-balanced tree, and its
encoding is trivial and totally know to the attacker. This is a real problem
(apart from b and c below), but how often can this happen ?

b) the attacker does not know (?) where the real data starts in the enciphered
stream, since Huffman is variable-length.

c) it is sufficient to ensure that not (p >> n). This is easily satistifed if
the expectation of n is in the order of one block size, since at most p <
block_size.

> > > Does the idea still sound sane to you?
> > I hope so; what is your opinion ?
> I doubt it. Maybe with a statistical encoding or even with block
> sorting followed by statistical encoding, you increase the complexity
> by much more than 8. But without changes, the idea looks pretty
> futile.
Could you detail what you mean with statistical encoding ? Thank you in
advance, Guillaume.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/