2009
01.01

There is a huge amount of mails flowing between different email servers everyday. At the same time, in the real world, in most countries, bandwidth is a measurement of your bill. If you are the host of an email server, you may always want to minimize the bandwidth you use. The trivial solution is to use compression when the server delivers messages. Compression is a very general technique that we use to minimize the storage size of data, there are two main different categories, lossless and lossy. When we compressing email messages, lossless compression must be used to make sure the data is complete after the compression-decompression process. In Request for Comments (RFC), there are few famous lossless compression methods deployed for compressing different protocols (eg. TLS, PPP), they are DEFLATE, Gzip and LZW. Of course, there are still some non-standardized compression methods for protocol.

Email server should use a secure channel when it is delivering sensitive messages. In general, we should use a secure channel when transmitting data in the network. To simplify the procedure, Transport Layer Security (TLS) is one of the way to provide a secure channel during transmission. TLS is a stateful cryptographic protocol to decide the asymmetric key that sender and receiver use during the communication.

Finally, after you reading the long introduction, now I would like to raise a question. If I want to both minimize the bandwidth the transmission used and secure the transmission, I have to compress and encrypt the data before deliver it, what is the difference between compress-encrypt (Compressed data in secure channel) and encrypt-compress (Encrypted data in compression channel)? What are the advantages and disadvantages of these two approach? IMO, compress-encrypt may be more secure because encryption function encapsulated the whole message. Although encrypt-compress is secure, it may expose the length of the message when looking at the compression history. Both required the input block size of compression and encryption be the same, otherwise buffering is required for the transmission. The advantage of encrypt-compress is we can guarantee that the compressing block size is fixed, since in most cases, we use public key encryption method which has block size of 2n bits.

There are some existed compression standard for encryption protocol (eg. TLS compression). Feel free to take a look, may be you can come up some ideas of the advantages and disadvantages of these two different approaches. I am looking forward to have discussion with you on these approaches.

Thoughts, comments, and suggestions are always welcome!

1 comment so far

Add Your Comment
  1. My answer would be compress-and-encrypt.
    In the email scenario above, lossy compression is clearly not an option, and I will hold this assumption for the rest of my reply.

    The reason why compression works is that the plaintext contains redundancy. E.g. there are certain patterns in the text, character frequencies are not uniform, etc.

    On the other hand, a good encryption algorithm should exhibit good diffusion and confusion. In short, it means that encrypted data should be indistinguishable from random noise. It is obvious that this property should hold regardless of the plaintext, otherwise the encryption algorithm is broken.

    Therefore, compress-and-encrypt produces smaller output with no security compromise per se, but encrypt-and-compress is like feeding random noise (whose redundancy is greatly reduced) into the compression algorithm with no obvious security benefit.

    Bouquets or eggs are welcome (o: